Two days before the main ISMB 2016 conference in Florida, the Network Biology Special Interests Group (NetBioSIG) meeting will take place. It is a great opportunity to meet up with experts in the field, so I hope to see you there. This years NetBioSIG will have four keynotes given by Olga Troyanskaya, Franca Fraternali, Nataša Pržulj, and yours sincerely.
There is of course also a chance for you to present your own work. However, please note that the abstract submission deadline is Friday, April 29. Please see the NetBioSIG website for more details.
The STRING database of known and predicted protein–protein interactions is a heavily used resource by bioinformaticians and non-bioinformaticians alike. The former generally use STRING via its web interface, whereas the latter typically download the complete network and analyze it locally. However, we lacked a good way for non-bioinformaticians to work with networks that are just too large for the web interface. A typical example of this would be users, who wish to visualize the results of a proteomics or transcriptomics study as a STRING network.
To address this, I have worked with John “Scooter” Morris to develop a new Cytoscape app for STRING. The app allows you to quickly retrieve much larger networks than is possible via the web interface and gives you the powerful layout and analysis features of Cytoscape. At the same time, it retains the “glass ball” look that many people associate with a STRING network (shown here with a small example network):
When retrieving network, the app also includes node attributes from the COMPARTMENTS and TISSUES databases. This allows users to easily, for example, color the nodes based on the confidence with which each protein is localized to a certain cellular compartment or expressed in a certain tissue. The app also includes node attributes for drug targets classification of human proteins, which are obtained from the Pharos web resource. Finally, since it is Cytoscape, you can obviously import your own attributes table.
Although it is not yet feature complete, version 0.9 of the app is already available from the Cytoscape App Store under the name stringApp. Please note that it requires Cytoscape 3.3 to work.
As mentioned in the last entry, 2015 has been a year of publishing web resources for my group. The COMPARTMENTS and DISEASES databases have yet another sister resource, namely TISSUES.
This web resource allows users to easily obtain a color-coded schematic of the tissue expression of a protein of interest, providing an at-a-glance overview of evidence from database annotations, from proteomics and transcriptomics studies as well as from automatic text mining of the scientific literature:
Whereas the resource integrates all of the above-mentioned types of evidence, the focus in this work was primarily on combining data from systematic tissue expression atlases, produced using a variety of different high-throughput assays. This required extensive work on mapping, scoring, and benchmarking the different datasets to put them on a common confidence scale. The scientific results and details of all those analyses can be found in the article “Comprehensive comparison of large-scale tissue expression datasets”.
Together with collaborators in the groups of Seán O’Donoghue and Reinhard Schneider, my group has recently launched a new web-accessible database named COMPARTMENTS.
COMPARTMENTS unifies subcellular localization evidence from many sources by mapping all proteins and compartments to their STRING identifiers and Gene Ontology terms, respectively. We import curated annotations from UniProtKB and model organism databases and assign confidence scores to them based on their evidence codes. For human proteins, we similarly import and score evidence from The Human Protein Atlas. COMPARTMENTS also uses text mining to derive subcellular localization evidence from co-occurrence of proteins and compartments in Medline abstracts. Finally, we precompute subcellular localization predictions with the sequence-based methods WoLF PSORT and YLoc. For further details, please refer to our recently published paper entitled “COMPARTMENTS: unification and visualization of protein subcellular localization evidence”.
To provide a simple overview of all this information, we visualize the combined localization evidence for each protein onto a schematic of an animal, fungal, or plant cell:
You can click any of the three images above to go to the COMPARTMENTS web resource. To facilitate use in large-scale analyses, the complete datasets for major eukaryotic model organisms are available for download.
Yesterday, I stumbled upon two links that I found interesting. The first was the map-based data visualization blog post 40 Maps That Will Help You Make Sense of the World, in which maps 24 and 28 hint at a correlation (click for larger interactive versions):
The first map shows the number of researchers per million inhabitants in each country. The second map shows the number of kg coffee consumed per capita per year. As ChartsBin allows you to download the data behind each map, I did so and produced a scatter plot that confirms the strong correlation (click for larger version):
This confirms my view that the coffee machine is the most important piece of hardware in a bioinformatics group. Bioinformaticians with coffee can do work even without a computer, but bioinformaticians without coffee are unable to work, no matter how good computers they have.
One should of course be careful to not jump to conclusions about causality based on correlation. This leads me to the second link: a new study published in Nature Neuroscience, which shows that Post-study caffeine administration enhances memory consolidation in humans.
I optimistically await a similar study confirming the correlation between Chocolate Consumption, Cognitive Function, and Nobel Laureates published last year in New England Journal of Medicine.
Antibodypedia is a very useful resource for finding commercially available antibodies against human proteins developed by Antibodypedia AB and Nature Publishing Group.
The resource is made available under the Creative Commons Attribution-NonCommercial 3.0 license, which allows for reuse and redistribution of the data for non-commercial purposes. However, the data are purely available for browsing through a web interface, which greatly limits systems biology uses of the resource. I thus wrote a robot to scrape all information from the web resource and convert it into a convenient tab-delimited file, which I have made available for download under the same license. This dataset covers a total of 579,038 antibodies against 16,827 human proteins.
To be able to use the dataset in conjunction with STRING and related resources, I next mapped the proteins to STRING protein identifiers. I was able to map 92% of all proteins in Antibodypedia. Having done this, I created the necessary files for the STRING payload mechanism to be able to show the information from Antibodypedia directly within STRING.
The end result looks like this when searching for the WNT7A protein:
The halos around the proteins encode the type and number of antibodies available. Red rings imply that at least one monoclonal antibody exists whereas gray rings imply that only polyclonal antibodies exist. The darker the ring (be it red or gray), the more different antibodies are available.
They STRING payload mechanism also extends the popups with additional information, here shown for LRP6:
The popup shows the total number of antibodies available and how many of them are monoclonal. It also provides a direct linkout to the relevant protein page on Antibodypedia.
Please, feel free to use this Antibodypedia-STRING mashup.
With the Oscars out of the way, it is time for the much more important scientific awards of of the year. It is with great pride that I can present to you the best of buzzwords 2011:
Click the image for an interactive version of the BuzzCloud. For an explanation of how to understand the visualization, please see the original BuzzCloud post.
The awards are:
- Best upcoming buzzword. The word in the cloud that looks most to me as an actual upcoming buzzword is optogenetics. It appeared first as a strong newcomer on the 2010 cloud, and it has only strengthened since.
- BuzzWord most ready for retirement. The award goes to synthetic biology, which first appeared on the cloud of 2004; despite being an exciting field, as buzzwords go it is becoming long in the tooth. Close runner up is omics, which first appeared on the cloud of 2005.
- Worst single-journal buzzword of 2011. Again this year we see buzzwords that have made it onto the cloud thanks to a single journal. The award is split between theranostic nanomedicine, which is pushed by the journal Accounts of Chemical Research, and vaccinomics, which is pushed by the (unfortunately named) journal OMICS. The irony that two distinctly medical-sounding buzzwords are championed by two non-medical journal has not escaped me.
- Worst buzzword to fail to disappear. In the BuzzCloud for 2010, I pointed out that astrology had to my horror popped up as a new buzzword in the biomedical literature. Sadly it has failed to disappear from the cloud in 2011!
With about one month delay relative to the release of the new baseline of PubMed, here is the updated BuzzCloud visualization for what was hot and up-coming 2010 (click image for larger interactive version):
Here is a quick overview of some of the trends that I found interesting:
- Geoepidemiology. A bit of searching in PubMed reveals this to be a buzzword primarily due to the journal Autoimmunity Reviews, which for some reason decided to publish 19 papers with this word in the title in 2010.
- Network pharmacology and systems pharmacology. Due to my personal interests, these buzzword caught my eye although they were mentioned in only 7 and 11 papers from 2010, respectively. I would have been more pleased of one of those had not been in a journal with a history of publishing pseudoscience.
- Metatranscriptomics and viral metagenomics. With metagenomics becoming reality rather than mere buzz, related and derived terms are predictably following suit.
- Orbitrap technology and iTRAQ proteomics. Like metagenomics, large-scale proteomics has become an established field. This is well reflected by two of the best-known proteomics technologies appearing in the 2010 BuzzCloud.
- Astrology. Falling firmly in the “dislike” category, I can only hope that it will be gone in next year’s BuzzCloud.
Sometimes things just come together at the right time. The past few weeks Heiko Horn, Sune Frankild, and I have made much progress on the new version of Reflect, which we hope to put into production very soon. One of the major new features is that Reflect can now be accessed as REST and SOAP web services. When Linden Lab made available the beta version of Second Life viewer 2, which enables you to place a web browser on a face of a 3D object, I simply had to try to put the two together to provide real-time text mining inside Second Life.
The system works as follows. The Reflect Second Life object contains an LSL script that listens to everything that is said in local chat. It sends any text that it picks up to the Reflect REST web service, which returns a simple XML document listing the entities (proteins and small molecules) that were mentioned in the text. The LSL script parses this XML, constructs a URL pointing to the Reflect popup that corresponds to the set of entities in question, and sets this as the shared media to be shown on the Reflect object in Second Life.
The result is an information board that automatically pulls up possibly relevant information related to what people close to it are talking about. The picture below shows the result of me typing a sentence that mentioned human and mouse IL-5 (click for a larger version).
I am well aware that this may not be particularly useful to very many people in Second Life. However, I think it is a nice technology demo of how much can be accomplished with the new Reflect API and just a few lines code.
At the Novo Nordisk Foundation Center for Protein Research we are looking for a scientist to provide bioinformatics support for the Protein Production Unit. For further details, please see the job advert below the fold.