Posts Tagged ‘visualization’

Resource: Second Life Interactive Dendrogram Rezzer (SLIDR)

July 4, 2009

About half a year ago, I began experimenting with Second Life as a tool for virtual conferences (I should add that my experiences have since improved). However, I believe that imitating real life in a virtual world is not necessarily the best way to use the technology – it may be better to use virtual reality for doing the things that are difficult to do in the real world. A good example of this is Hiro’s Molecule Rezzer, which is one of the best known scientific tools in Second Life. It, and its much improved successor Orac, allows people to easily construct molecular models of small molecules in Second Life.

After speaking with several other researchers in Second Life, who like I are interested in evolution, I set out to build a similar tool for visualization of phylogenetic trees. The result is SLIDR (Second Life Interactive Dendrogram Rezzer), which based on a tree in Newick format constructs a dendrogram object. The first version of SLIDR can handle trees both with and without branch lengths; however, I have not yet implemented support for labels on internal nodes or for bootstrap values.

The picture below shows an example of a dendrogram that was automatically generated by SLIDR based on a Newick tree:

SLIDR closeup

There is a bit more to SLIDR than this, though. After the dendrogram has been built, it can be loaded with a photo and/or a sound for each of the leaf nodes. When click on a node, the corresponding sound will be played and the photo will be shown on the associated screen (the white box in front of which I stand):

SLIDR posing

I plan to work with collaborators in Second Life to construct dendrograms for evolution of bats (including their echolocation sounds and photos of the animals) and for the fully sequenced Drosophila genomes. Please do hesitate to contact me if you would like to use SLIDR on another project. I intend to make SLIDR available as open source software once I have implemented support for the full Newick format.

WebCiteCite this post

Resource: STRING v8.1

June 25, 2009

After months of hard work from the entire STRING team – thanks everyone -  I am pleased to be able to say that STRING v8.1 has now been put into production. Here is a screen shot of the start page:

STRING 8.1 start page

This is a minor release of STRING, which means that the imported databases of microarray expression data, protein interactions, genetic interactions, and pathways as well as text-mining evidence have all been updated. We have also fixed a bug that affected the minority of bacteria that have multiple chromosomes.

Another notable feature of STRING v8.1 is the new interactive network viewer that is implemented in Adobe Flash:

STRING 8.1 network viewer

For further details please see the post on the official STRING/STITCH blog.

WebCiteCite this post

Update: The BuzzCloud for 2008

January 19, 2009

Yes, it is that time of the year again – we are now almost three weeks into 2009, most papers published in 2008 have hopefully made it into Medline, and it is time to reveal the words of 2008. In other words, I have updated the BuzzCloud resource and here is the result for 2008 (click on the image to go to the web resource):

BuzzCloud 2008

I am thrilled to see the outcome. Without any cheating or tweaking, several buzzwords related to proteomics make it on the list with “phosphoproteomics” and “quantitative phosphoproteomics” being the two most prominent of them. Nice for me to see considering that my new research group at the Novo Nordisk Foundation Center for Protein Research will focus heavily on improving and applying the NetworKIN and NetPhorest resources for analysis of phosphoproteomics data.

Commentary: Summarizing papers as word clouds

June 27, 2008

For use in presentations on literature mining, I did a back-of-the-envelope calculation of how much time I would be able to spend on each new biomedical paper that is published. Assuming that all papers were indexed in PubMed (which they are not) and that I could read papers 24 hours per day all year around (which I cannot), the result is that I could allocate approximately 50 seconds per paper. This nicely illustrates the point that no one can keep up with the complete biomedical literature.

When I discovered Wordle, which can turn any text into a beautiful word cloud, I thus wondered if this visualization method would be useful for summarizing a complete paper as a single figure. To test this, I extracted the complete text of three papers that I coauthored in the NAR database issue 2008. Submitting these to Wordle resulted in the three figures below (click for larger versions):


All in all, I think that Wordle does a pretty good job at capturing the essence of each paper: the first cloud shows that STITCH is a database of interactions between proteins and chemicals, the second cloud shows that NetworKIN is a database predictions related to the kinases and phosphorylation, and the third cloud shows that Cyclebase.org is a database of experiments on gene expression during the cell cycle. However, a paper describing a database might be easier to summarize that a typical research paper.

As a final test, I therefore submitted the complete text from my paper “Evolution of Cell Cycle Control – Same molecular machines, different regulation”, which describes the somewhat complex concept of just-in-time assembly to Wordle (click for larger version):

The result is rather less impressive than for the papers from the NAR database issue. Although the word cloud does contain a good selection of words, it fails to convey the main message. I think a large part of the problem is the splitting of multiwords; for example, “cell cycle” becomes two separate terms “cell” and “cycle”. Another problem is that words from different sections of the paper are mixed, which blurs the messages. These two issues could be solved by 1) detecting multiwords and considering them as single tokens, and 2) sorting the terms according to where in the paper they are mainly used.

WebCiteCite this post

Resource: The BuzzCloud visualization of buzzwords

February 29, 2008

“Oh, you work on systems biology? So do I!”

New buzzwords to describe scientific disciplines and technologies seem to pop up every year. For the fun of it, I have developed a small web resource, BuzzClouds, that provides a visual overview of the latest buzzwords in biomedicine.

Without destroying your weekend with mathematical formulas, here is how the BuzzCloud selection and visualization method works:

  • A list of potential buzzwords is constructed by extracting all one- and two-word phrases ending on -ics, -ology, -omy, -phy, -chemistry, -medicine, or -sciences. These endings were select to get buzzwords that correspond to scientific disciplines and technologies.
  • The potential buzzwords are ranked according to a score that takes into account their frequencies within the past year and within the preceding decade (for details see this review article). To get a high score, a buzzword must be both frequent and new. The top-50 buzzwords are included in the cloud.
  • The size of each buzzword is proportional to the logarithm of its frequency during the past year. Common buzzwords are thus large where as rare buzzwords are small.
  • The brightness of each buzzword shows the frequency of the buzzword within the past year relative to the preceding decade. New buzzwords are thus bright whereas the older ones are darker.
  • Finally, each buzzword is assignd a tint that goes from yellow via white to cyan based on how often it occurs in scientific journals (yellow) as opposed to medical journals (cyan).

When run for the year 2007, the end result looks like this (BuzzClouds for other years are available from the web resource):

50 buzzwords identified based on Medline abstracts from 2007

I think the method does a pretty decent job despite the occasional mistakes such as nice technology and timely topics. In terms of scientific buzzwords, quantitative proteomics is booming, systems biology still hot although it is getting a bit long in the tooth, and synthetic biology is rapidly gaining popularity. And nanotechnology seems to be popular within the medical domain, giving rise to buzzwords like nanomedicine and nanotherapeutics.

Maybe I should write a buzzword-compliant, interdisciplinary grant application that combines click chemistry and synthetic biology to develop novel nanotherapeutics.

WebCiteCite this post