Monthly Archives: June 2009

Resource: STRING v8.1

After months of hard work from the entire STRING team – thanks everyone –  I am pleased to be able to say that STRING v8.1 has now been put into production. Here is a screen shot of the start page:

STRING 8.1 start page

This is a minor release of STRING, which means that the imported databases of microarray expression data, protein interactions, genetic interactions, and pathways as well as text-mining evidence have all been updated. We have also fixed a bug that affected the minority of bacteria that have multiple chromosomes.

Another notable feature of STRING v8.1 is the new interactive network viewer that is implemented in Adobe Flash:

STRING 8.1 network viewer

For further details please see the post on the official STRING/STITCH blog.

WebCiteCite this post


Analysis: On the evolution of protein length and phosphorylation sites

It has been much too long since I have last written a blog post. Part of the reason has been that I have been busy moving back to Denmark, starting up a research group, and co-founding a company. More on that in other blog posts. The main reason, however, has been a lack of papers that inspired me to do the simple follow-up analyses that I usually blog about.

This has thankfully changed now. Pedro Beltrao and coworkers recently published an interesting paper in PLoS Biology on the evolution of regulation through protein phosphorylation. The paper presents several interesting analyses and comparisoins of phosphoproteomics data from three yeast species; the abstract summarizes the findings better than I can do:

Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species
The extent by which different cellular components generate phenotypic diversity is an ongoing debate in evolutionary biology that is yet to be addressed by quantitative comparative studies. We conducted an in vivo mass-spectrometry study of the phosphoproteomes of three yeast species (Saccharomyces cerevisiae, Candida albicans, and Schizosaccharomyces pombe) in order to quantify the evolutionary rate of change of phosphorylation. We estimate that kinase–substrate interactions change, at most, two orders of magnitude more slowly than transcription factor (TF)–promoter interactions. Our computational analysis linking kinases to putative substrates recapitulates known phosphoregulation events and provides putative evolutionary histories for the kinase regulation of protein complexes across 11 yeast species. To validate these trends, we used the E-MAP approach to analyze over 2,000 quantitative genetic interactions in S. cerevisiae and Sc. pombe, which demonstrated that protein kinases, and to a greater extent TFs, show lower than average conservation of genetic interactions. We propose therefore that protein kinases are an important source of phenotypic diversity.

Figure 1a in the paper shows the intriguing observation that, despite rapid evolution of individual phosphorylation sites, the relative number of phosphorylation sites within proteins from different functional classes (Gene Ontology categories) remains remarkably constant between species:

Beltrao et al., PLoS Biology, 2009, Figure 1a

However, it occurred to me that this could potentially be a consequence of longer proteins having more phosphorylation sites, and protein length being conserved through evolution. I thus counted the number of unique phosphorylation sites identified in each protein (thanks to Pedro Beltrao for providing the data) and correlated it with the length of the proteins. In the two plots below, I have pooled the proteins so that each dot corresponds to 100 proteins. The upper and lower panels show the results for S. cerevisiae and S. pombe, respectively:

Number of phosphorylation sites vs. protein lengh for S. cerevisiae

Number of phosphorylation sites vs. protein length for S. pombe

As should be evident from the plots, the average number of phosphorylation sites in a protein correlates strongly with its length, which is by no means surprisings. It is unclear to me why the intercept with the y-axis appears to differ from zero in both plots; suggestions are welcome.

The next question was whether the Gene Ontology terms that correspond to proteins with many phosphorylation sites are indeed assigned to proteins that are longer than average. I thus examined the terms “Cell budding”, “Morphogenesis”, and “Signal transduction”.

The average S. cerevisiae protein is 450 aa long. Proteins annotated with “Cell budding”, “Morphogenesis”, and “Signal transduction” are on average 1.6 (739 aa), 2.1 (945 aa), and 1.5 (679 aa) times longer, respectively. By comparison, the corresponding ratios observed for phosphorylation sites are approximately 2.3, 2.6, and 2.4. It would thus appear that differences in protein length between functional classes of proteins account for much, but not all, of the signal that was observed by Beltrao et al. when comparing the number phosphorylation sites.

Edit: Make sure to read Pedro Beltrao’s follow-up blog post, which nicely confirms that whereas protein length does play a role, it is not the full story.

WebCiteCite this post