Category Archives: Resource

Resource: Combining tools and databases into a single GreenMamba web resource

In the four previous blog posts I introduced the GreenMamba framework (download) and showed how it can be used to turn a simple tab-delimited files or command-line tools into web resources with a bare minimum of effort. In this post I will show how easy it is to configure multiple databases or tools to run under the same Mamba server and how to make them accessible as a single web resource.

To illustrate this, I will take the Instances database and the Motifs tool and turn them into a web resource called ELM (the name of the database from which the instance data and motifs were obtained in the first place). The following inifile is all it takes to do so:

host : localhost
port : 8080
plugins : ./greenmamba

database : greenmamba/examples/instances.tsv

command : greenmamba/examples/ $motif @fasta
page_home : greenmamba/examples/motifs_home.html

tools : Motifs; Instances;

The [SERVER] section is exactly as in all the previous examples, instructing the Mamba server to run on localhost port 8080 and to import the GreenMamba plugin. The [Instances] section configures a simple database called Instances based on the tab-delimited file instances.tsv, and the [Motifs] section configures a web tool called Motifs that runs the Perl script These two sections are unchanged compared to the previous blog posts and have here simply been put into inifile, which is how one hosts multiple databases or tools under the same Mamba server. The last section, [ELM], is the only new part. It instructs GreenMamba to configure a metatool called ELM that combines the two tools Motifs and Instances.

Starting the Mamba server with this inifile and accessing http://localhost:8080/HTML/ELM yields the following web interface:

As you can see, what used to be a tool called Motifs has now become a tab within the resource ELM that shows the same (customized) input form. Similarly, the database Instances has become a tab within the same resource:

If you press the submit button for Motifs or Instances, you will get output that is formatted as it was when using Motifs and Instances as separate resources, the only change being that the header says ELM. In the next blog post, I will show how the design of GreenMamba web resources can be further customized and how design changes are consistently applied throughout all the individual tools that make up the metatool.

Resource: Improving a GreenMamba web resource with a custom input form

In the previous blog post I showed how you can use GreenMamba (download) to turn a command-line tool into a simplistic web tool with a minimal of effort. Sometimes, however, you will want to put in just a bit more effort and use a custom input form instead of the default one.

The default input page that was automatically created by GreenMamba based on the syntax of the command alone allowed the user to enter a motif in the form of a regular expression:

Suppose we would rather allow the user to select one of the 166 motifs from the ELM database through an input page looking like this:

To achieve this we add on line to the inifile, which instructs GreenMamba to use the custom HTML in the file motifs_home.html instead of the auto-generated input page:

host : localhost
port : 8080
plugins : ./greenmamba

command : greenmamba/examples/ $motif @fasta
page_home : greenmamba/examples/motifs_home.html

The file motifs_home.html contains the following piece of HTML code (numerous <option> lines for different ELMs replaced with ... for brevity):

Select the Eukaryotic Linear Motif to search:<br />
<select name='motif'>
<option value='[ILV]..[R][VF][GS].'>CLV_MEL_PAP_1</option>
<option value='(.RK)|(RR[^KR])'>CLV_NDR_NDR_1</option>
<option value='R.[RK]R.'>CLV_PCSK_FUR_1</option>
</select><br />
<br />
Enter the sequences to be searched in FASTA format:<br/>
<textarea name='fasta' cols='80' rows='20'>&gt;MYB_HUMAN

Note that this is not a complete HTML page but only the piece of HTML code that goes between the <form> and /<form> tags (minus the submit button). Also note that the names of the input fields must match the handles specified under command in the inifile (e.g. fasta and motif; if they do not, GreenMamba will have no idea where to insert the user input in the command.

The example above is unusually complex due to the mapping of ELM names to regular expression. Usually your custom HTML forms will be far shorter. In those cases you may not even want to store the custom HTML file and instead provide the HTML on a single line inside the inifile, which GreenMamba supports.

Finally, it should be pointed out that this customization step is entirely optional. You do not have to edit HTML forms to set up GreenMamba web resources, but you have the flexibility to do so if you want to.

Resource: Turning a command-line tool into a web tool with GreenMamba

In two previous blog posts we introduced the GreenMamba framework (download) and showed how it can be used to easily set up a web database from an Excel sheet or tab-delimited file. However, the primary motivation for developing GreenMamba was to make it as simple as possible to turn command-line tools, e.g. sequence-based prediction methods, into full-fledged web tools.

The work that would normally be required to do so is to install a web server, create an HTML page with an input form, and code a CGI script that receives the input from the form, converts the input data into command-line arguments, executes the command-line tool, and returns the result. This is not terribly difficult provided that you know how to configure a web server (e.g. Apache) and write CGI scripts. However, it takes considerable time to design a consistent, professional looking HTML web interface that handles both input and output and works correctly on all major web browsers.

With GreenMamba setting up a command-line tool as a web tool requires only a few lines in the inifile describing the name and command syntax of the tool. To exemplify this, we have made a simple example Perl script that simply searches a regular expression against a set of protein or DNA sequences in a FASTA file, both of which are provided by the user. The following inifile is all it takes to turn that Perl script into a web tool:

host : localhost
port : 8080
plugins : ./greenmamba

command : greenmamba/examples/ $motif @fasta

The first two lines should be familiar from the previous blog post, and the last two lines specify that we have a tool called Motifs, which should run the Perl script with two arguments $motif and @fasta. The difference between handles starting with $ and @ is that the former will be replaced with the input data, whereas the latter will be replaced with the name of temporary file containing the input data. In the example, the script is to be run with a regular expression ($motif) as first argument and the name of a fasta file (@fasta) as second argument.

Based on the command-line syntax given in the inifile alone, GreenMamba creates the following rudimentary web interface, which can be accessed through http://localhost:8080/HTML/Motifs (here shown with a query):

The names of the various handles (@fasta and $motif) are used as labels for the input fields. It is thus possible to improve the interface a bit simply by giving the handles more descriptive names (underscores will be shown as spaces). GreenMamba also allows the use of a customized input form, which will be explained in an upcoming blog post.

In this example above, pressing the submit button causes GreenMamba to take the command from the inifile, replace $motif with the content of the motif text field, replace @fasta with the name of a temporary file into which the content of the fasta textarea has been written, and execute the resulting command using a system call. Subsequently the output of the command is read and temporary files deleted. In this particular case, the script produces tab-delimited output, which GreenMamba automatically detects and formats as an HTML table in the output page:

If the output is not tab-delimited, it is by default shown as plain pre-formatted text. However, through the .ini file you can change it to handle several other types of output including comma-separated values, HTML, and several image formats. We will likely add support for more formats in the future.

Resource: Turning an Excel sheet into a web-accessible database with GreenMamba

Anyone who has worked with computational biology for many years will be familiar with the following situation: from collaborators you have received an Excel spreadsheet, which is generously referred to as a “database”, and you now need to make the data accessible to the world. One could obviously simply provide the file for download; however, it would be much preferred if the data could be searched through a simple web interface.

This is not a particularly difficult job, but it is a fair amount of work to do. Typically you would need set up a database (be that an SQL database or something else), write a CGI script that queries the database and formats the result as an HTML table, and spend some time on web design to make the input and output pages look aesthetically pleasing. It all takes a lot of time that you would probably rather spend on doing something more productive. Consequently this is often not done at all, and data sets that might be of value to others are thus never made available.

One of the key features of the GreenMamba project (see previous blog post on the topic) is to make it as easy as possible to turn any regular Excel spreadsheet into a web database with nearly no work involved. In fact, all it takes is the following four steps:

  1. Download and unpack Mamba.
  2. Save your spreadsheet in tab-delimited format with column names in the first line.
  3. Add the following two lines to your .ini file:
    database : my_spreadsheet.tsv
  4. Start the Mamba server (./mambasrv my_database.ini)

To exemplify this, we downloaded the complete list of 1743 known instances of Eukaryotic Linear Motifs from the ELM database. The following inifile is all it taks to turn the resulting tab-delimited file into a simple web-accessible database:

host : localhost
port : 8080
plugins : ./greenmamba

database : greenmamba/examples/instances.tsv

The [SERVER] tag specifies the host of the computer where the mamba web server actually runs and the plugins variable specifies where to load the plugins that enable the whole green-mamba framework and should always be set to this to work. The [Instances] tag specifies the name of the database and the database points to the tab-delimited version of the spreadsheet. After starting the mamba server you can access http://localhost:8080/HTML/Instances and to see the following query interface (here shown with a query):

Upon submitting the query, GreenMamba retrieves all lines that match the search criteria and formats them as an output page:

One could set up a nicer and simpler version of the database by filtering the tab-delimited file a bit. For example, one might want to leave out the columns ELMType (which is redundant with ELMIdentifer), Accessions, InstanceLogic, Evidence, PDB, and Organism (which is redundant with ProteinName) and rename ELMIdentifier to ELM and ProteinName to Protein. This would result in a simpler query form and a more concise results table. Doing this is left as an exercise for the interested reader.

Resource: Turning databases and tools into web resources with GreenMamba

Today, the users of bioinformatics databases and tools increasingly rely on being able to access them through web interfaces. Almost all major databases and most of the commonly used tools can be accessed in this manner, which is mostly good news from the users perspective. However, in my experience from teaching on numerous courses, these users have never worked with a command line and thus typically run their head against a wall the moment they have to do anything slightly more specialized than, for example, running a BLAST search or making a multiple alignment.

The reason for this is simple: specialist tools and databases are typically not made available through user-friendly web interfaces, because they have too few users to make it worthwhile to create such an interface. Worse yet, the tools are in many cases not even distributed, because the many dependencies and lack of documentation would result in too many questions if one were to distribute it. Consequently, almost every bioinformatician that I have spoken about this has one or more resources that they are currently not sharing – not because they are not willing to share, but because sharing would imply too much extra work. To address this problem, we have developed a web server that allows you to easily wrap existing databases and tools with a web interface like the one shown below.

In my group we are involved in the development and maintenance of many bioinformatics web resources, and I have thus been pushing the development of a reusable infrastructure. The result of this is the Python framework Mamba, which has primarily been developed by Sune Frankild and myself. Briefly, Mamba is a network-centric, multi-threaded queuing system that deals with the many technical aspects related to network communication with the clients and server-side resource management. All the specific work pertaining to a resource is done by modules that run under the Mamba server. GreenMamba is one such Mamba module, which based on a simple configuration file can provide a complete web interface around a tab-delimited data file or a command-line tool.

It is thus with great pleasure that we can now release the first version of the Mamba queuing system and GreenMamba wrapper under the BSD license. We hope that by eliminating most of the work in setting up bioinformatics web resources, it will encourage people to make available data sets and tools that hitherto were not worthwhile the time and effort to set up.

Over the next days and weeks, I plan to publish a series of blog posts that illustrate how one can use this framework to wrap a web interface around existing databases and command-line tools with practically no work. Impatient people are welcome to download the software and look in the greenmamba/examples directory.

Resource: Real-time text mining in Second Life using the Reflect API

Sometimes things just come together at the right time. The past few weeks Heiko Horn, Sune Frankild, and I have made much progress on the new version of Reflect, which we hope to put into production very soon. One of the major new features is that Reflect can now be accessed as REST and SOAP web services. When Linden Lab made available the beta version of Second Life viewer 2, which enables you to place a web browser on a face of a 3D object, I simply had to try to put the two together to provide real-time text mining inside Second Life.

The system works as follows. The Reflect Second Life object contains an LSL script that listens to everything that is said in local chat. It sends any text that it picks up to the Reflect REST web service, which returns a simple XML document listing the entities (proteins and small molecules) that were mentioned in the text. The LSL script parses this XML, constructs a URL pointing to the Reflect popup that corresponds to the set of entities in question, and sets this as the shared media to be shown on the Reflect object in Second Life.

The result is an information board that automatically pulls up possibly relevant information related to what people close to it are talking about. The picture below shows the result of me typing a sentence that mentioned human and mouse IL-5 (click for a larger version).

I am well aware that this may not be particularly useful to very many people in Second Life. However, I think it is a nice technology demo of how much can be accomplished with the new Reflect API and just a few lines code.

Resource: Second Life Interactive Dendrogram Rezzer (SLIDR)

About half a year ago, I began experimenting with Second Life as a tool for virtual conferences (I should add that my experiences have since improved). However, I believe that imitating real life in a virtual world is not necessarily the best way to use the technology – it may be better to use virtual reality for doing the things that are difficult to do in the real world. A good example of this is Hiro’s Molecule Rezzer, which is one of the best known scientific tools in Second Life. It, and its much improved successor Orac, allows people to easily construct molecular models of small molecules in Second Life.

After speaking with several other researchers in Second Life, who like I are interested in evolution, I set out to build a similar tool for visualization of phylogenetic trees. The result is SLIDR (Second Life Interactive Dendrogram Rezzer), which based on a tree in Newick format constructs a dendrogram object. The first version of SLIDR can handle trees both with and without branch lengths; however, I have not yet implemented support for labels on internal nodes or for bootstrap values.

The picture below shows an example of a dendrogram that was automatically generated by SLIDR based on a Newick tree:

SLIDR closeup

There is a bit more to SLIDR than this, though. After the dendrogram has been built, it can be loaded with a photo and/or a sound for each of the leaf nodes. When click on a node, the corresponding sound will be played and the photo will be shown on the associated screen (the white box in front of which I stand):

SLIDR posing

I plan to work with collaborators in Second Life to construct dendrograms for evolution of bats (including their echolocation sounds and photos of the animals) and for the fully sequenced Drosophila genomes. Please do hesitate to contact me if you would like to use SLIDR on another project. I intend to make SLIDR available as open source software once I have implemented support for the full Newick format.

WebCiteCite this post