Anyone who has worked with computational biology for many years will be familiar with the following situation: from collaborators you have received an Excel spreadsheet, which is generously referred to as a “database”, and you now need to make the data accessible to the world. One could obviously simply provide the file for download; however, it would be much preferred if the data could be searched through a simple web interface.
This is not a particularly difficult job, but it is a fair amount of work to do. Typically you would need set up a database (be that an SQL database or something else), write a CGI script that queries the database and formats the result as an HTML table, and spend some time on web design to make the input and output pages look aesthetically pleasing. It all takes a lot of time that you would probably rather spend on doing something more productive. Consequently this is often not done at all, and data sets that might be of value to others are thus never made available.
One of the key features of the GreenMamba project (see previous blog post on the topic) is to make it as easy as possible to turn any regular Excel spreadsheet into a web database with nearly no work involved. In fact, all it takes is the following four steps:
- Download and unpack Mamba.
- Save your spreadsheet in tab-delimited format with column names in the first line.
- Add the following two lines to your
database : my_spreadsheet.tsv
- Start the Mamba server (./mambasrv my_database.ini)
To exemplify this, we downloaded the complete list of 1743 known instances of Eukaryotic Linear Motifs from the ELM database. The following inifile is all it taks to turn the resulting tab-delimited file into a simple web-accessible database:
host : localhost
port : 8080
plugins : ./greenmamba
database : greenmamba/examples/instances.tsv
[SERVER] tag specifies the host of the computer where the mamba web server actually runs and the
plugins variable specifies where to load the plugins that enable the whole green-mamba framework and should always be set to this to work. The
[Instances] tag specifies the name of the database and the
database points to the tab-delimited version of the spreadsheet. After starting the mamba server you can access http://localhost:8080/HTML/Instances and to see the following query interface (here shown with a query):
Upon submitting the query, GreenMamba retrieves all lines that match the search criteria and formats them as an output page:
One could set up a nicer and simpler version of the database by filtering the tab-delimited file a bit. For example, one might want to leave out the columns ELMType (which is redundant with ELMIdentifer), Accessions, InstanceLogic, Evidence, PDB, and Organism (which is redundant with ProteinName) and rename ELMIdentifier to ELM and ProteinName to Protein. This would result in a simpler query form and a more concise results table. Doing this is left as an exercise for the interested reader.