It has been much too long since I have last written a blog post. Part of the reason has been that I have been busy moving back to Denmark, starting up a research group, and co-founding a company. More on that in other blog posts. The main reason, however, has been a lack of papers that inspired me to do the simple follow-up analyses that I usually blog about.
This has thankfully changed now. Pedro Beltrao and coworkers recently published an interesting paper in PLoS Biology on the evolution of regulation through protein phosphorylation. The paper presents several interesting analyses and comparisoins of phosphoproteomics data from three yeast species; the abstract summarizes the findings better than I can do:
Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species
The extent by which different cellular components generate phenotypic diversity is an ongoing debate in evolutionary biology that is yet to be addressed by quantitative comparative studies. We conducted an in vivo mass-spectrometry study of the phosphoproteomes of three yeast species (Saccharomyces cerevisiae, Candida albicans, and Schizosaccharomyces pombe) in order to quantify the evolutionary rate of change of phosphorylation. We estimate that kinase–substrate interactions change, at most, two orders of magnitude more slowly than transcription factor (TF)–promoter interactions. Our computational analysis linking kinases to putative substrates recapitulates known phosphoregulation events and provides putative evolutionary histories for the kinase regulation of protein complexes across 11 yeast species. To validate these trends, we used the E-MAP approach to analyze over 2,000 quantitative genetic interactions in S. cerevisiae and Sc. pombe, which demonstrated that protein kinases, and to a greater extent TFs, show lower than average conservation of genetic interactions. We propose therefore that protein kinases are an important source of phenotypic diversity.
Figure 1a in the paper shows the intriguing observation that, despite rapid evolution of individual phosphorylation sites, the relative number of phosphorylation sites within proteins from different functional classes (Gene Ontology categories) remains remarkably constant between species:
However, it occurred to me that this could potentially be a consequence of longer proteins having more phosphorylation sites, and protein length being conserved through evolution. I thus counted the number of unique phosphorylation sites identified in each protein (thanks to Pedro Beltrao for providing the data) and correlated it with the length of the proteins. In the two plots below, I have pooled the proteins so that each dot corresponds to 100 proteins. The upper and lower panels show the results for S. cerevisiae and S. pombe, respectively:


As should be evident from the plots, the average number of phosphorylation sites in a protein correlates strongly with its length, which is by no means surprisings. It is unclear to me why the intercept with the y-axis appears to differ from zero in both plots; suggestions are welcome.
The next question was whether the Gene Ontology terms that correspond to proteins with many phosphorylation sites are indeed assigned to proteins that are longer than average. I thus examined the terms “Cell budding”, “Morphogenesis”, and “Signal transduction”.
The average S. cerevisiae protein is 450 aa long. Proteins annotated with “Cell budding”, “Morphogenesis”, and “Signal transduction” are on average 1.6 (739 aa), 2.1 (945 aa), and 1.5 (679 aa) times longer, respectively. By comparison, the corresponding ratios observed for phosphorylation sites are approximately 2.3, 2.6, and 2.4. It would thus appear that differences in protein length between functional classes of proteins account for much, but not all, of the signal that was observed by Beltrao et al. when comparing the number phosphorylation sites.
Edit: Make sure to read Pedro Beltrao’s follow-up blog post, which nicely confirms that whereas protein length does play a role, it is not the full story.
Tags: evolution, phosphorylation, proteomics, regulation


Cite this post
August 11, 2009 at 8:38
Imported from FriendFeed:
Khader Shameer, Neil Saunders and Deepak Singh liked this.
Thanks for the analysis Lars. I had a look at issue you pointed out in what it relates to our study http://pbeltrao.blogspot.com/2009/06/reply-on-evolution-of-protein-length.html – Pedro Beltrao
It would be cool to know how much of the correlation you mention might be due to MS bias. – Pedro Beltrao
Thanks Pedro – redoing the same plot based on average number of phosphorylation sites per residue instead of on the raw counts was exactly what I wanted to do. It would just have been too much work to redo it without asking you for access to all your files. So I hoped that you would do it :) The result is as I expected; when I looked at protein lengths it did not explain the full signal, so there should be a correlation left after the length correction. – Lars Juhl Jensen
This is great discussion. One day, online papers will be wiki documents and you’ll be able to edit one another’s figures :-) – Neil Saunders
Intuitively, I would expect that due to biology the number of phosphorylation sites is directly proportional to the length of the protein (i.e. that the average density of phosphorylation sites is independent of protein length). When it comes to MS biases then I think that protein expression level is a much more important factor to take into account; a phosphorylation site on an abundant protein is much more likely to be picked up than one on a protein present in only one copy per cell. – Lars Juhl Jensen
Come to think of it: if there is a group of fairly small, highly expressed proteins (ribosomal proteins?), then this would explain the negative intercept with the y-axis in my plots. (Edit: rethinking the argument, this would give a positive intercept, not a negative one.) – Lars Juhl Jensen
Right; you can have small proteins with zero sites, but you can’t have zero-length proteins. – Neil Saunders