Category Archives: Resource

Resource: Cytoscape App for STRING

The STRING database of known and predicted protein–protein interactions is a heavily used resource by bioinformaticians and non-bioinformaticians alike. The former generally use STRING via its web interface, whereas the latter typically download the complete network and analyze it locally. However, we lacked a good way for non-bioinformaticians to work with networks that are just too large for the web interface. A typical example of this would be users, who wish to visualize the results of a proteomics or transcriptomics study as a STRING network.

To address this, I have worked with John “Scooter” Morris to develop a new Cytoscape app for STRING. The app allows you to quickly retrieve much larger networks than is possible via the web interface and gives you the powerful layout and analysis features of Cytoscape. At the same time, it retains the “glass ball” look that many people associate with a STRING network (shown here with a small example network):


When retrieving network, the app also includes node attributes from the COMPARTMENTS and TISSUES databases. This allows users to easily, for example, color the nodes based on the confidence with which each protein is localized to a certain cellular compartment or expressed in a certain tissue. The app also includes node attributes for drug targets classification of human proteins, which are obtained from the Pharos web resource. Finally, since it is Cytoscape, you can obviously import your own attributes table.


Although it is not yet feature complete, version 0.9 of the app is already available from the Cytoscape App Store under the name stringApp. Please note that it requires Cytoscape 3.3 to work.

Resource: The TISSUES database on tissue expression of genes and proteins

As mentioned in the last entry, 2015 has been a year of publishing web resources for my group. The COMPARTMENTS and DISEASES databases have yet another sister resource, namely TISSUES.

This web resource allows users to easily obtain a color-coded schematic of the tissue expression of a protein of interest, providing an at-a-glance overview of evidence from database annotations, from proteomics and transcriptomics studies as well as from automatic text mining of the scientific literature:


Whereas the resource integrates all of the above-mentioned types of evidence, the focus in this work was primarily on combining data from systematic tissue expression atlases, produced using a variety of different high-throughput assays. This required extensive work on mapping, scoring, and benchmarking the different datasets to put them on a common confidence scale. The scientific results and details of all those analyses can be found in the article “Comprehensive comparison of large-scale tissue expression datasets”.

Resource: The DISEASES database on disease–gene associations

2015 has been an exceptionally busy year in my group in terms of publishing databases and other web resources; so busy that I have failed to write blog posts describing several of them.

One of them is the DISEASES database, which is described in detail in an article with the informative, if not very inventive title “DISEASES: Text mining and data integration of disease–gene associations”.

The DISEASES database can be viewed as a sister resource to the subcellular localization database COMPARTMENTS, which you can read more about in this blog post. Indeed, the two resources share much of their infrastructure, including the web framework, the backend database, and the text-mining pipeline.

The big difference between the two resources is the scope: whereas COMPARTMENTS links proteins to their subcellular localizations, DISEASES links them to the diseases in which they are implicated. To this end we make use of the Disease Ontology, which turned out to be very well suited for text-mining purposes due to its many synonyms for terms. Text mining is the most important source of associations but is complemented by manually curated associations from Genetics Home Reference and UniProtKB as well as GWAS results imported from DistiLD.

To facilitate usage in large-scale analysis and integration into other databases, all data in DISEASES are available for download. Indeed, the text-mined associations from DISEASES are already included in both GeneCards and Pharos.

Resource: The COMPARTMENTS database on protein subcellular localization

Together with collaborators in the groups of Seán O’Donoghue and Reinhard Schneider, my group has recently launched a new web-accessible database named COMPARTMENTS.

COMPARTMENTS unifies subcellular localization evidence from many sources by mapping all proteins and compartments to their STRING identifiers and Gene Ontology terms, respectively. We import curated annotations from UniProtKB and model organism databases and assign confidence scores to them based on their evidence codes. For human proteins, we similarly import and score evidence from The Human Protein Atlas. COMPARTMENTS also uses text mining to derive subcellular localization evidence from co-occurrence of proteins and compartments in Medline abstracts. Finally, we precompute subcellular localization predictions with the sequence-based methods WoLF PSORT and YLoc. For further details, please refer to our recently published paper entitled “COMPARTMENTS: unification and visualization of protein subcellular localization evidence”.

To provide a simple overview of all this information, we visualize the combined localization evidence for each protein onto a schematic of an animal, fungal, or plant cell:




You can click any of the three images above to go to the COMPARTMENTS web resource. To facilitate use in large-scale analyses, the complete datasets for major eukaryotic model organisms are available for download.

Resource: Antibodypedia bulk download file and STRING payload

Antibodypedia is a very useful resource for finding commercially available antibodies against human proteins developed by Antibodypedia AB and Nature Publishing Group.

The resource is made available under the Creative Commons Attribution-NonCommercial 3.0 license, which allows for reuse and redistribution of the data for non-commercial purposes. However, the data are purely available for browsing through a web interface, which greatly limits systems biology uses of the resource. I thus wrote a robot to scrape all information from the web resource and convert it into a convenient tab-delimited file, which I have made available for download under the same license. This dataset covers a total of 579,038 antibodies against 16,827 human proteins.

To be able to use the dataset in conjunction with STRING and related resources, I next mapped the proteins to STRING protein identifiers. I was able to map 92% of all proteins in Antibodypedia. Having done this, I created the necessary files for the STRING payload mechanism to be able to show the information from Antibodypedia directly within STRING.

The end result looks like this when searching for the WNT7A protein:

Antibodypedia STRING network

The halos around the proteins encode the type and number of antibodies available. Red rings imply that at least one monoclonal antibody exists whereas gray rings imply that only polyclonal antibodies exist. The darker the ring (be it red or gray), the more different antibodies are available.

They STRING payload mechanism also extends the popups with additional information, here shown for LRP6:

Antibodypedia STRING popup

The popup shows the total number of antibodies available and how many of them are monoclonal. It also provides a direct linkout to the relevant protein page on Antibodypedia.

Please, feel free to use this Antibodypedia-STRING mashup.

Resource: You want REST with that GreenMamba?

When you set up a GreenMamba resource, you get not only a web interface for human users, but also a REST web service API meant for scripts to interact with your tool. The REST interface is such an integral part of GreenMamba, that it is not even optional – you get it whether you want it or not. However, since the purpose of setting up GreenMamba resources is to make your tools and databases accessible to others, we cannot think of a good reason why you would not want to expose them as web services when it takes no extra work.

To illustrate how command-line tools can be accessed as web services, we return to the Motifs tool described in an earlier blot post. In addition to having an HTML web interface, it is accessible as a REST web service through the following URL (here shown as a GET request for simplicity; POST requests are also supported):

motif=regular expression

The name and parameters for the web service map one-to-one to the resource name and command-line arguments specified in the inifile:

command : greenmamba/examples/ $motif @fasta

GreenMamba also provides a REST web service API around any database that you configure through the inifile, although it is admittedly not as elegant as it could be. However, there is not much need for an API in this case, as the database functionality of GreenMamba is only intended for databases that are so small that they can easily be downloaded in their entirety instead.

In case of a GreenMamba metatool there is not a corresponding web service per se. However, because a metatool is made up of a list of subtools that all have their individual sections in the inifile, each of the underlying tools has a REST web service. All the functionalities of a metatool are thus nonetheless exposed as web services.

It should be noted that the REST web services provided by GreenMamba return the output from the underlying tools as is. It may thus be worthwhile to change (or write a post-processing scripts for) tools so that they produce simple tabular output. This will both make GreenMamba format the output nicer in the HTML web interface and make the REST web services more usable.

Resource: Adding bells & whistles to GreenMamba

My latest blog post ended at the stage where we had combined the Instances database and the Motifs tool into a single metatool. In this post I will show how little it takes to add the bells and whistles that turn it into the complete, professional web resource that I showed as a teaser in the first blog post of this series.

You may not want green to be the design color used throughout your web interface. This is easily changed by adding a line like color : #083D65 to your inifile. You can use named colors instead of hex values if you prefer. Whichever color you pick will be used throughout the web interface to ensure a consistent design.

In the simple default design the frame changes size when changing between the Motifs and Instances input forms because the forms are not equally wide. This can easily be changed by setting a fixed width for all lines by adding line such as width : 650px. You do not have to necessarily specify the width in pixels, any units permitted in cascading style sheets can be used.

Most bioinformatics web resources require one or more pages to explain what the resource is all about. Such pages can easily be provided within the GreenMamb framework a by adding lines with the same syntax as page_home. If you add a page_about line, you will get an ABOUT menu item at the top right, which when clicked will show provided HTML text wrapped with within the GreenMamba layout to provide a consistent look. There is nothing magic to the word “about”; for example, if you write page_download you will get a page named DOWNLOAD.

You may want to also add a footer that is shown at the bottom of every page that, for example, mentions who made the resource, whom to contact in case of scientific questions or technical problems, and possibly points to one or more papers that describe the tools and which the user is requested to cite. To insert a footer you simple add a line to the inifile with the keyword footer followed by the text you want shown; this text can contain HTML code.

If you set up a Mamba server to host a single resource, you will want the Mamba server to automatically direct users to the main input form in case they access the server without requesting a specific page. For example, we would want to redirect requests for localhost:8080 to localhost:8008/HTML/ELM. This can be done the [REWRITE] section in the inifile, which allows you to specify simple URL rewrite rules similar to what can be done in Apache.

Below is the inifile required to set up the complete ELM example resource as it was shown in the first blog post of this series:

host : localhost
port : 8080
plugins : ./greenmamba


database : greenmamba/examples/instances.tsv

command : greenmamba/examples/ $motif @fasta
page_home : greenmamba/examples/motifs_home.html

tools : Motifs; Instances;
color : #083D65
width : 650px
footer : Disclaimer: This is ELM mirror only serves as an example for the GreenMamba framework. For scientific purposes, please use the real ELM server instead.
page_about : greenmamba/examples/elm_about.html

Starting up the Mamba server with this inifile and accessing localhost:8080 yields this interface:

Clicking the ABOUT link will brings up the contents of the file elm_about.html wrapped with the GreenMamba design elements:

In case you want to include pictures or other content on your pages, you do not need a separate web server to host this. Mamba implements a simple web server that you can use for this purpose; all you have to do is to add a www_dir : <directory> in the [SERVER] section of the inifile and place the files you want to serve within the specified directory.

Finally, the output pages of the metatool are also formatted to follow the design specified in the inifile. The header shows the name if the metatool, color matches that of the other pages, the menu with links to the pages is shown, and the footer is included:

Resource: Combining tools and databases into a single GreenMamba web resource

In the four previous blog posts I introduced the GreenMamba framework (download) and showed how it can be used to turn a simple tab-delimited files or command-line tools into web resources with a bare minimum of effort. In this post I will show how easy it is to configure multiple databases or tools to run under the same Mamba server and how to make them accessible as a single web resource.

To illustrate this, I will take the Instances database and the Motifs tool and turn them into a web resource called ELM (the name of the database from which the instance data and motifs were obtained in the first place). The following inifile is all it takes to do so:

host : localhost
port : 8080
plugins : ./greenmamba

database : greenmamba/examples/instances.tsv

command : greenmamba/examples/ $motif @fasta
page_home : greenmamba/examples/motifs_home.html

tools : Motifs; Instances;

The [SERVER] section is exactly as in all the previous examples, instructing the Mamba server to run on localhost port 8080 and to import the GreenMamba plugin. The [Instances] section configures a simple database called Instances based on the tab-delimited file instances.tsv, and the [Motifs] section configures a web tool called Motifs that runs the Perl script These two sections are unchanged compared to the previous blog posts and have here simply been put into inifile, which is how one hosts multiple databases or tools under the same Mamba server. The last section, [ELM], is the only new part. It instructs GreenMamba to configure a metatool called ELM that combines the two tools Motifs and Instances.

Starting the Mamba server with this inifile and accessing http://localhost:8080/HTML/ELM yields the following web interface:

As you can see, what used to be a tool called Motifs has now become a tab within the resource ELM that shows the same (customized) input form. Similarly, the database Instances has become a tab within the same resource:

If you press the submit button for Motifs or Instances, you will get output that is formatted as it was when using Motifs and Instances as separate resources, the only change being that the header says ELM. In the next blog post, I will show how the design of GreenMamba web resources can be further customized and how design changes are consistently applied throughout all the individual tools that make up the metatool.

Resource: Improving a GreenMamba web resource with a custom input form

In the previous blog post I showed how you can use GreenMamba (download) to turn a command-line tool into a simplistic web tool with a minimal of effort. Sometimes, however, you will want to put in just a bit more effort and use a custom input form instead of the default one.

The default input page that was automatically created by GreenMamba based on the syntax of the command alone allowed the user to enter a motif in the form of a regular expression:

Suppose we would rather allow the user to select one of the 166 motifs from the ELM database through an input page looking like this:

To achieve this we add on line to the inifile, which instructs GreenMamba to use the custom HTML in the file motifs_home.html instead of the auto-generated input page:

host : localhost
port : 8080
plugins : ./greenmamba

command : greenmamba/examples/ $motif @fasta
page_home : greenmamba/examples/motifs_home.html

The file motifs_home.html contains the following piece of HTML code (numerous <option> lines for different ELMs replaced with ... for brevity):

Select the Eukaryotic Linear Motif to search:<br />
<select name='motif'>
<option value='[ILV]..[R][VF][GS].'>CLV_MEL_PAP_1</option>
<option value='(.RK)|(RR[^KR])'>CLV_NDR_NDR_1</option>
<option value='R.[RK]R.'>CLV_PCSK_FUR_1</option>
</select><br />
<br />
Enter the sequences to be searched in FASTA format:<br/>
<textarea name='fasta' cols='80' rows='20'>&gt;MYB_HUMAN

Note that this is not a complete HTML page but only the piece of HTML code that goes between the <form> and /<form> tags (minus the submit button). Also note that the names of the input fields must match the handles specified under command in the inifile (e.g. fasta and motif; if they do not, GreenMamba will have no idea where to insert the user input in the command.

The example above is unusually complex due to the mapping of ELM names to regular expression. Usually your custom HTML forms will be far shorter. In those cases you may not even want to store the custom HTML file and instead provide the HTML on a single line inside the inifile, which GreenMamba supports.

Finally, it should be pointed out that this customization step is entirely optional. You do not have to edit HTML forms to set up GreenMamba web resources, but you have the flexibility to do so if you want to.

Resource: Turning a command-line tool into a web tool with GreenMamba

In two previous blog posts we introduced the GreenMamba framework (download) and showed how it can be used to easily set up a web database from an Excel sheet or tab-delimited file. However, the primary motivation for developing GreenMamba was to make it as simple as possible to turn command-line tools, e.g. sequence-based prediction methods, into full-fledged web tools.

The work that would normally be required to do so is to install a web server, create an HTML page with an input form, and code a CGI script that receives the input from the form, converts the input data into command-line arguments, executes the command-line tool, and returns the result. This is not terribly difficult provided that you know how to configure a web server (e.g. Apache) and write CGI scripts. However, it takes considerable time to design a consistent, professional looking HTML web interface that handles both input and output and works correctly on all major web browsers.

With GreenMamba setting up a command-line tool as a web tool requires only a few lines in the inifile describing the name and command syntax of the tool. To exemplify this, we have made a simple example Perl script that simply searches a regular expression against a set of protein or DNA sequences in a FASTA file, both of which are provided by the user. The following inifile is all it takes to turn that Perl script into a web tool:

host : localhost
port : 8080
plugins : ./greenmamba

command : greenmamba/examples/ $motif @fasta

The first two lines should be familiar from the previous blog post, and the last two lines specify that we have a tool called Motifs, which should run the Perl script with two arguments $motif and @fasta. The difference between handles starting with $ and @ is that the former will be replaced with the input data, whereas the latter will be replaced with the name of temporary file containing the input data. In the example, the script is to be run with a regular expression ($motif) as first argument and the name of a fasta file (@fasta) as second argument.

Based on the command-line syntax given in the inifile alone, GreenMamba creates the following rudimentary web interface, which can be accessed through http://localhost:8080/HTML/Motifs (here shown with a query):

The names of the various handles (@fasta and $motif) are used as labels for the input fields. It is thus possible to improve the interface a bit simply by giving the handles more descriptive names (underscores will be shown as spaces). GreenMamba also allows the use of a customized input form, which will be explained in an upcoming blog post.

In this example above, pressing the submit button causes GreenMamba to take the command from the inifile, replace $motif with the content of the motif text field, replace @fasta with the name of a temporary file into which the content of the fasta textarea has been written, and execute the resulting command using a system call. Subsequently the output of the command is read and temporary files deleted. In this particular case, the script produces tab-delimited output, which GreenMamba automatically detects and formats as an HTML table in the output page:

If the output is not tab-delimited, it is by default shown as plain pre-formatted text. However, through the .ini file you can change it to handle several other types of output including comma-separated values, HTML, and several image formats. We will likely add support for more formats in the future.