Resource: Second Life Interactive Dendrogram Rezzer (SLIDR)

July 4, 2009

About half a year ago, I began experimenting with Second Life as a tool for virtual conferences (I should add that my experiences have since improved). However, I believe that imitating real life in a virtual world is not necessarily the best way to use the technology – it may be better to use virtual reality for doing the things that are difficult to do in the real world. A good example of this is Hiro’s Molecule Rezzer, which is one of the best known scientific tools in Second Life. It, and its much improved successor Orac, allows people to easily construct molecular models of small molecules in Second Life.

After speaking with several other researchers in Second Life, who like I are interested in evolution, I set out to build a similar tool for visualization of phylogenetic trees. The result is SLIDR (Second Life Interactive Dendrogram Rezzer), which based on a tree in Newick format constructs a dendrogram object. The first version of SLIDR can handle trees both with and without branch lengths; however, I have not yet implemented support for labels on internal nodes or for bootstrap values.

The picture below shows an example of a dendrogram that was automatically generated by SLIDR based on a Newick tree:

SLIDR closeup

There is a bit more to SLIDR than this, though. After the dendrogram has been built, it can be loaded with a photo and/or a sound for each of the leaf nodes. When click on a node, the corresponding sound will be played and the photo will be shown on the associated screen (the white box in front of which I stand):

SLIDR posing

I plan to work with collaborators in Second Life to construct dendrograms for evolution of bats (including their echolocation sounds and photos of the animals) and for the fully sequenced Drosophila genomes. Please do hesitate to contact me if you would like to use SLIDR on another project. I intend to make SLIDR available as open source software once I have implemented support for the full Newick format.


Resource: STRING v8.1

June 25, 2009

After months of hard work from the entire STRING team – thanks everyone -  I am pleased to be able to say that STRING v8.1 has now been put into production. Here is a screen shot of the start page:

STRING 8.1 start page

This is a minor release of STRING, which means that the imported databases of microarray expression data, protein interactions, genetic interactions, and pathways as well as text-mining evidence have all been updated. We have also fixed a bug that affected the minority of bacteria that have multiple chromosomes.

Another notable feature of STRING v8.1 is the new interactive network viewer that is implemented in Adobe Flash:

STRING 8.1 network viewer

For further details please see the post on the official STRING/STITCH blog.


Analysis: On the evolution of protein length and phosphorylation sites

June 25, 2009

It has been much too long since I have last written a blog post. Part of the reason has been that I have been busy moving back to Denmark, starting up a research group, and co-founding a company. More on that in other blog posts. The main reason, however, has been a lack of papers that inspired me to do the simple follow-up analyses that I usually blog about.

This has thankfully changed now. Pedro Beltrao and coworkers recently published an interesting paper in PLoS Biology on the evolution of regulation through protein phosphorylation. The paper presents several interesting analyses and comparisoins of phosphoproteomics data from three yeast species; the abstract summarizes the findings better than I can do:

Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species
The extent by which different cellular components generate phenotypic diversity is an ongoing debate in evolutionary biology that is yet to be addressed by quantitative comparative studies. We conducted an in vivo mass-spectrometry study of the phosphoproteomes of three yeast species (Saccharomyces cerevisiae, Candida albicans, and Schizosaccharomyces pombe) in order to quantify the evolutionary rate of change of phosphorylation. We estimate that kinase–substrate interactions change, at most, two orders of magnitude more slowly than transcription factor (TF)–promoter interactions. Our computational analysis linking kinases to putative substrates recapitulates known phosphoregulation events and provides putative evolutionary histories for the kinase regulation of protein complexes across 11 yeast species. To validate these trends, we used the E-MAP approach to analyze over 2,000 quantitative genetic interactions in S. cerevisiae and Sc. pombe, which demonstrated that protein kinases, and to a greater extent TFs, show lower than average conservation of genetic interactions. We propose therefore that protein kinases are an important source of phenotypic diversity.

Figure 1a in the paper shows the intriguing observation that, despite rapid evolution of individual phosphorylation sites, the relative number of phosphorylation sites within proteins from different functional classes (Gene Ontology categories) remains remarkably constant between species:

Beltrao et al., PLoS Biology, 2009, Figure 1a

However, it occurred to me that this could potentially be a consequence of longer proteins having more phosphorylation sites, and protein length being conserved through evolution. I thus counted the number of unique phosphorylation sites identified in each protein (thanks to Pedro Beltrao for providing the data) and correlated it with the length of the proteins. In the two plots below, I have pooled the proteins so that each dot corresponds to 100 proteins. The upper and lower panels show the results for S. cerevisiae and S. pombe, respectively:

Number of phosphorylation sites vs. protein lengh for S. cerevisiae

Number of phosphorylation sites vs. protein length for S. pombe

As should be evident from the plots, the average number of phosphorylation sites in a protein correlates strongly with its length, which is by no means surprisings. It is unclear to me why the intercept with the y-axis appears to differ from zero in both plots; suggestions are welcome.

The next question was whether the Gene Ontology terms that correspond to proteins with many phosphorylation sites are indeed assigned to proteins that are longer than average. I thus examined the terms “Cell budding”, “Morphogenesis”, and “Signal transduction”.

The average S. cerevisiae protein is 450 aa long. Proteins annotated with “Cell budding”, “Morphogenesis”, and “Signal transduction” are on average 1.6 (739 aa), 2.1 (945 aa), and 1.5 (679 aa) times longer, respectively. By comparison, the corresponding ratios observed for phosphorylation sites are approximately 2.3, 2.6, and 2.4. It would thus appear that differences in protein length between functional classes of proteins account for much, but not all, of the signal that was observed by Beltrao et al. when comparing the number phosphorylation sites.

Edit: Make sure to read Pedro Beltrao’s follow-up blog post, which nicely confirms that whereas protein length does play a role, it is not the full story. See also this and this discussion on FriendFeed.


Update: The BuzzCloud for 2008

January 19, 2009

Yes, it is that time of the year again – we are now almost three weeks into 2009, most papers published in 2008 have hopefully made it into Medline, and it is time to reveal the words of 2008. In other words, I have updated the BuzzCloud resource and here is the result for 2008 (click on the image to go to the web resource):

BuzzCloud 2008

I am thrilled to see the outcome. Without any cheating or tweaking, several buzzwords related to proteomics make it on the list with “phosphoproteomics” and “quantitative phosphoproteomics” being the two most prominent of them. Nice for me to see considering that my new research group at the Novo Nordisk Foundation Center for Protein Research will focus heavily on improving and applying the NetworKIN and NetPhorest resources for analysis of phosphoproteomics data.


Editorial: Virtual conferences in Second Life

January 18, 2009

This blog has been very quiet for a long time. There are several reasons for this, most of which are positive: I have not had many boring or negative results to write blog posts about, I have been busy writing manuscripts about the positive results instead, and I have moved to Copenhagen where I am busy starting my own research group at the Novo Nordisk Foundation Center for Protein Research. There is also one more reason for the absence of blog posts from me: I have spent a lot of time experimenting with Second Life, and that is the topic of this blog post.

I first got interested in Second Life when I heard that Nature Publishing Group was setting up a virtual conference center called Elucian Islands. In the beginning I felt very alone on Elucian Islands. There was a good reason for that – I was alone most of the time. My view on Second Life was thus that it was pretty (see images below) but rather useless.

I obviously took a look at the SciFoo presentations (seen in the background of the image above) and the other scientific displays at Elucian Islands and elsewhere in Second Life. However, these mostly reinforced my negative view of Second Life being fairly useless, since almost everything I saw was already being served better by dedicated resources. For example, slide shows are much more conveniently viewed and shared in SlideShare than in Second Life, and 3D protein structures can be examined and analyzed better in programs such as PyMOL.

Over at FriendFeed, Jean-Claude Bradley fought a brave fight trying to convince me that Second Life is in fact useful for science. His key point was that Second Life is all about interacting with people, so I should try to go to some scientific events in Second Life. Sadly, there are still not many such events, and although they have changed my view on Second Life, they have also shown that there are many problems that remain to be solved.

The first virtual seminar I went to was “Cancer, Cell Cycle, and Check Points” organized by Digi S Lab. This was a perfect match since I work on cell-cycle regulation myself. The seminar consisted of two excellent presentations given by Letizia Cito from Sbarro Health Research Organization and Fayamdria Foley from the American Cancer Society.

Meeting on Cancer and Cell Cycle 1

Meeting on Cancer and Cell Cycle 2

Whereas the presentations were great, the seminar also illustrated several of the problems that need to be overcome before virtual conferences in Second Life are ready for prime time. When the first talk started, I could not see any of the slides. Restarting my Second Life client did not solve the problem, nor did a reboot of my computer. After giving up solving the problem, the entire region in which the seminar took place suddenly crashed causing speakers and participants to all be logged out. When it came back online after some minutes and everyone had found their way back, I could suddenly see the slides. Even then, however, they took so long to appear on my screen that the presenter had typically explained half of what was on a slide by the time I could see the slide. I see this as a major problem that must be solved before Second Life conferences can work properly – it must be possible to change slides without a noticeable delay.

The second event I went to was the “ESRC Complexity Research Seminar in Second Life” that took place at Elucian Islands. This seminar was very different from the one described above in that it was not a purely virtual seminar; instead it was a video feed from a real-world seminar that was being transmitted into Second Life. Think of it as a virtual overflow room – the image below shows the people who had gathered shortly before the event started.

ESRC Complexity Research Seminar

Sadly, this event was marred by technical problems. The sound stream was of such poor quality that the Second Life participants could barely understand a word of what the speakers were saying, and the video stream was of too low quality to be able to read their slides. I do not want to dwell on this but just note that good quality microphones and cameras are a prerequisite for streaming events into Second Life.

The third event I went to was the “Virtual Conference on Climate Change and CO2 Storage“, which again took place at Elucian Islands. This was again a mixed event taking place both in the real world and in Second Life. The presentations were excellent and important lessons had been learned from the previous events. The microphones worked perfectly this time, and the video feed had been abandoned in favor of showing a copy of the actual slides in Second Life, which greatly improved the readability.

In addition to these events, Elucian Islands now also runs regular events such as the weekly Nature Podcast event where a fairly large group of people gather to listen to the latest podcast shortly after it has been released (image from Joanna Scott’s blog).

Nature Podcast at Elucian Islands

Regular events are crucial in SL because they bring people together in the same place at the same time. The need for people to be online at the same time is in my view one of the major drawbacks of Second Life compared to other tools that researchers can use for social networking. In my view Second Life should thus not be seen as competing with tools like FriendFeed or Twitter, which you can read when you feel like it, but rather as virtual reality alternative to video conferences. I think that Nature Publishing Group is on the right track with this, and I hope that the few remaining technical hurdles will be overcome in the near future.

Full disclosure: I have been working with the staff from Nature Publishing Group trying to solve technical challenges on Elucian Islands.


Analysis: Four complementary yeast interactomes

October 4, 2008

The latest issue of Science features a paper by Yu et al. in which they report the results of a comprehensive yeast two-hybrid (Y2H) screen for interactions between budding yeast proteins. Just a few months earlier, Science published a paper by Tarassov et al. that describes a similar screen performed using a novel protein fragment complementation assay (PCA). Peer Bork and I wrote a Perspectives piece on these two papers, showing that the different assays for detecting protein interactions are complementary in the sense that they capture interactions for different subsets of the proteome. For example, PCA detects many interactions for membrane proteins whereas Y2H detects many interactions for nuclear proteins.

As part of writing the Perspectives piece, I performed numerous analyses that were not included in the final publication, because they were either too technical for a broad audience, not interesting enough to spend valuable space on, or would involve additional figures. Thankfully, my blog imposes no limitations on the number of words or figures (nor is it required that the content is interesting, although that is desirable).

The comparison included, in addition to the two interactomes introduced above, a third interactome that consists of all the high-confidence interactions identified by Gavin et al. and Krogan et al. using the tandem affinity purification (TAP) method. Also included in the comparison (but not in the Perspectives piece) was the literature-curated (LC) set of interactions published by Reguly et al. in 2006.

The Venn diagram below shows the overlap of the four interactomes in terms of proteins, that is a protein is considered to belong to an interactome if the method in question suggested at least one interaction partner:

The numbers outside the ellipses specify the total number of proteins for which a given method identified interactions. Notably, the PCA, Y2H, and TAP interactomes cover only approximately one sixth, one third, and half of the yeast proteome, respectively, despite all three assays having been tested on all yeast ORFs. This suggests that only a fraction of proteins can be targeted with a given assay.

A second way to compare the four interactomes is to count their overlaps in terms of pairs of interacting proteins. To provide additional detail, I distinguished between interactions that are not found in a given interactome because one or both proteins are not covered by the interactome in question (dashed lines in the diagrams), and interactions that were not found despite both proteins being covered (full lines in the diagrams). The Venn diagrams below show all twelve pairwise comparisions of the four interactomes:

As expected, the largest overlap is observed when comparing the two largest interactomes (LC and TAP), whereas the smallest overlap is observed when comparing the smallest interactomes (PCA and Y2H). Even if taking into account the differences in terms of protein coverage, however, the the overlaps between the interactomes leave a lot to be desired.

There are several reasons for the poor overlap at the level of pairwise interactions. One is that false positive interactions are unlikely to be reproducible by a different assay. A second is that the assays measure fundamentally different types of interactions: PCA and Y2H measure direct binary interactions between proteins, whereas TAP measures co-complex interactions, that is whether two proteins are part of the same complex or not. This is illustrated in the figure below, which shows the binary and co-complex networks for three different scenarios:

The two types of assays have different strengths and weaknesses. Binary interaction assays can in principle distinguish between the two first complexes, which only differ in that the subunits B and C are in direct contact in first complex but not in the second. However, binary assays are not able to distinguish between the second and the third scenario, that is whether A, B, and C form a single complex (ABC) or two complexes (AB and AC). Conversely, data from co-complex assays are able to answer the latter question but are unable to distinguish between the two first scenarios. The different assays thus complement each other, not only because they are able to interrogate different subsets of the proteome, but also because they provide us with complementary information about the composition and topology of protein complexes.

WebCiteCite this post


Analysis: Cell-cycle-regulated proteins are more abundant in haploid relative to diploid cells

September 30, 2008

Two days ago, Matthias Mann’s group published a paper in Nature in which they compare the level of individual proteins in haploid relative to diploid budding yeast cells:

Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast

Mass spectrometry is a powerful technology for the analysis of large numbers of endogenous proteins. However, the analytical challenges associated with comprehensive identification and relative quantification of cellular proteomes have so far appeared to be insurmountable. Here, using advances in computational proteomics, instrument performance and sample preparation strategies, we compare protein levels of essentially all endogenous proteins in haploid yeast cells to their diploid counterparts. Our analysis spans more than four orders of magnitude in protein abundance with no discrimination against membrane or low level regulatory proteins. Stable-isotope labelling by amino acids in cell culture (SILAC) quantification was very accurate across the proteome, as demonstrated by one-to-one ratios of most yeast proteins. Key members of the pheromone pathway were specific to haploid yeast but others were unaltered, suggesting an efficient control mechanism of the mating response. Several retrotransposon-associated proteins were specific to haploid yeast. Gene ontology analysis pinpointed a significant change for cell wall components in agreement with geometrical considerations: diploid cells have twice the volume but not twice the surface area of haploid cells. Transcriptome levels agreed poorly with proteome changes overall. However, after filtering out low confidence microarray measurements, messenger RNA changes and SILAC ratios correlated very well for pheromone pathway components. Systems-wide, precise quantification directly at the protein level opens up new perspectives in post-genomics and systems biology.

Although the paper focuses on the larger amount of cell-wall proteins and proteins involved in pheromone response in haploid cells, the supplementary tables reveal similar biases for many other functional classes, including nucleosomes and cyclin-dependent kinase inhibitors. As many of these proteins are regulated during the cell cycle, I suspected that cell-cycle-regulated proteins might be more abundant in haploid cells relative to diploid cells.

To test this hypothesis, I divided the proteins quantified by the Mann group into two classes: dynamic proteins, which are encoded by genes that are periodically expressed during the cell cycle, and static proteins, which are encoded by genes that are expressed at a constant level (de Lichtenberg et al., 2005). For each class, I plotted the log2-ratios of the protein levels in haploid and diploid cells:

The plot reeals a quite strong shift of dynamic proteins toward higher log-ratios; this difference is highly significant according to the Mann-Whitney U test (P < 10-12). Proteins encoded by cell-cycle-regulated genes are thus in general more abundant in haploid budding yeast cells than in diploid cells.

Full disclosure: I currently collaborate with Matthias Mann and members of his group, and we will soon be colleagues a the Novo Nordisk Foundation Center for Protein Research.

WebCiteCite this post


Analysis: Transcriptional and posttranslational regulation of cell-cycle kinases

August 31, 2008

Daub and coworkers from Matthias Mann’s group recently published a paper in Molecular Cell, describing a phosphoproteomics study of kinases during S and M phase of the mitotic cell cycle:

Kinase-selective enrichment enables quantitative phosphoproteomics of the kinome across the cell cycle.

Protein kinases are pivotal regulators of cell signaling that modulate each other’s functions and activities through site-specific phosphorylation events. These key regulatory modifications have not been studied comprehensively, because low cellular abundance of kinases has resulted in their underrepresentation in previous phosphoproteome studies. Here, we combine kinase-selective affinity purification with quantitative mass spectrometry to analyze the cell-cycle regulation of protein kinases. This proteomics approach enabled us to quantify 219 protein kinases from S and M phase-arrested human cancer cells. We identified more than 1000 phosphorylation sites on protein kinases. Intriguingly, half of all kinase phosphopeptides were upregulated in mitosis. Our data reveal numerous unknown M phase-induced phosphorylation sites on kinases with established mitotic functions. We also find potential phosphorylation networks involving many protein kinases not previously implicated in mitotic progression. These results provide a vastly extended knowledge base for functional studies on kinases and their regulation through site-specific phosphorylation.

In the study, they identified phosphorylation sites for 219 protein kinases, of which 159 showed differential phosphorylation (at least two-fold induction for at least one site) in S and/or M phase.

My collaborators at CBS and I have previously shown that transcriptional and posttranslational regulation (for example, phosphorylation by cyclin-dependent kinases) tend to target the same proteins (de Lichtenberg et al., 2005; Jensen et al., 2006). One should thus expect that the differentially regulated kinases have a tendency to be encoded by periodically expressed genes.

To test this hypothesis, I compared the phosphoproteomics data of Daub et al. to the cell-cycle microarray expression study by Whitfield et al. (2002). I was able to map 132 of the 159 kinases to the microarrays and found that 17 of them are encoded by the top-600 cycling genes. This corresponds to a significant (P < 0.001) two-fold overrepresentation of transcriptional cell-cycle regulation among the genes encoding kinases that are differentially phosphorylated during S and/or M phase.

One could imagine that this trend is not specific to kinases that are differentially phosphorylated during the cell cycle, but that it instead applies to kinases in general. To test this, I also mapped the 60 non-modulated kinases found by Daub et al. to the microarrays (Whitfield et al., 2002). Of the 54 kinases that could be mapped, only 3 are encoded by periodically expressed genes, which is almost exactly what is expected by random chance.

I next examined if timing of phosphorylation correlates with the timing of expression of the 17 kinases mentioned above. The kinases can be divided into three classes: phosphorylated in S phase, phosphorylated in M phase, and phosphorylated in both S and M phase. Notably, 13 of the 17 kinases fall in to the M phase class. Looking at the peak times of expression for these (that is when in the cell-cycle the corresponding mRNAs are most highly expressed) reveals that 8 of the 13 kinases are presumably synthesized in M phase only shortly before they become phosphorylated.

In summary, comparison of the phosphoproteomics data from Daub et al. (2008) and the microarray expression data from Whitfield et al. (2002) supports the view that transcriptional and posttranslational regulation tend to target the same proteins during the mitotic cell cycle. Moreover, it shows that for most of the kinases that are subject to such dual cell-cycle control, both expression and phosphorylation takes place during M phase when the cyclin-dependent kinase activity is maximal.

Full disclosure: I currently collaborate with Matthias Mann and members of his group, and we will soon be colleagues a the Novo Nordisk Foundation Center for Protein Research.

WebCiteCite this post


Commentary: On large protein complexes and the essentiality of hubs

August 2, 2008

In 2001, Jeong and coworkers published a paper in Nature in which they showed that the central proteins in interaction networks, that is the proteins with the highest connectivity, are enriched for essential proteins. This publication has been highly influential as evidenced by the numerous subsequent publications on the importance of “hub” proteins. Several hypothesis have been published that try to explain why hubs are essential, for example that certain protein interactions are essential and that a protein with many interactions is thus more likely to be involved in at least one essential interaction (He and Zhang, 2006).

Yesterday, Zotenko and coworkers published a paper in PLoS Computational Biology in which they take a closer look at the cause of this phenomenon:

Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality.

The centrality-lethality rule, which notes that high-degree nodes in a protein interaction network tend to correspond to proteins that are essential, suggests that the topological prominence of a protein in a protein interaction network may be a good predictor of its biological importance. Even though the correlation between degree and essentiality was confirmed by many independent studies, the reason for this correlation remains illusive. Several hypotheses about putative connections between essentiality of hubs and the topology of protein-protein interaction networks have been proposed, but as we demonstrate, these explanations are not supported by the properties of protein interaction networks. To identify the main topological determinant of essentiality and to provide a biological explanation for the connection between the network topology and essentiality, we performed a rigorous analysis of six variants of the genomewide protein interaction network for Saccharomyces cerevisiae obtained using different techniques. We demonstrated that the majority of hubs are essential due to their involvement in Essential Complex Biological Modules, a group of densely connected proteins with shared biological function that are enriched in essential proteins. Moreover, we rejected two previously proposed explanations for the centrality-lethality rule, one relating the essentiality of hubs to their role in the overall network connectivity and another relying on the recently published essential protein interactions model.

What Zotenko et al. show is, in other words, that essential hubs tend to be highly connected with each other and hence form large “Essential Complex Biological Modules”. Table 7 in their paper lists the Gene Ontology terms associated with these modules; among the recurring themes are “rRNA metabolic process”, “mRNA metabolic process”, “RNA splicing”, “ribosome biogenesis and assembly”, and “proteolysis”. These Gene Ontology terms obviously correspond to well known protein complexes, namely the RNA polymerases, the spliceosome, the ribosome, and the proteoasome. The analysis of Zotenko et al. thus suggests that the much debated correlation between centrality and essentiality is simply a consequence of the fact that many of the large protein complexes in a eukaryotic cell are essential, which is hardly surprising considering that they have been conserved through more than two billion years of evolution (Brocks et al., 1999).

Edit: For more views on the results of Zotenko et al. see the discussion on FriendFeed.

WebCiteCite this post


Live: ISMB 2008 coverage

July 18, 2008

I am now at the ISMB conference from where I will attempt to provide live coverage of the events. To avoid flooding this blog with posts related to the conference, I have set up a separate blog on Tumblr for this purpose. All my posts there will also appear on my FriendFeed.