Analysis: On the evolution of protein length and phosphorylation sites

It has been much too long since I have last written a blog post. Part of the reason has been that I have been busy moving back to Denmark, starting up a research group, and co-founding a company. More on that in other blog posts. The main reason, however, has been a lack of papers that inspired me to do the simple follow-up analyses that I usually blog about.

This has thankfully changed now. Pedro Beltrao and coworkers recently published an interesting paper in PLoS Biology on the evolution of regulation through protein phosphorylation. The paper presents several interesting analyses and comparisoins of phosphoproteomics data from three yeast species; the abstract summarizes the findings better than I can do:

Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species
The extent by which different cellular components generate phenotypic diversity is an ongoing debate in evolutionary biology that is yet to be addressed by quantitative comparative studies. We conducted an in vivo mass-spectrometry study of the phosphoproteomes of three yeast species (Saccharomyces cerevisiae, Candida albicans, and Schizosaccharomyces pombe) in order to quantify the evolutionary rate of change of phosphorylation. We estimate that kinase–substrate interactions change, at most, two orders of magnitude more slowly than transcription factor (TF)–promoter interactions. Our computational analysis linking kinases to putative substrates recapitulates known phosphoregulation events and provides putative evolutionary histories for the kinase regulation of protein complexes across 11 yeast species. To validate these trends, we used the E-MAP approach to analyze over 2,000 quantitative genetic interactions in S. cerevisiae and Sc. pombe, which demonstrated that protein kinases, and to a greater extent TFs, show lower than average conservation of genetic interactions. We propose therefore that protein kinases are an important source of phenotypic diversity.

Figure 1a in the paper shows the intriguing observation that, despite rapid evolution of individual phosphorylation sites, the relative number of phosphorylation sites within proteins from different functional classes (Gene Ontology categories) remains remarkably constant between species:

Beltrao et al., PLoS Biology, 2009, Figure 1a

However, it occurred to me that this could potentially be a consequence of longer proteins having more phosphorylation sites, and protein length being conserved through evolution. I thus counted the number of unique phosphorylation sites identified in each protein (thanks to Pedro Beltrao for providing the data) and correlated it with the length of the proteins. In the two plots below, I have pooled the proteins so that each dot corresponds to 100 proteins. The upper and lower panels show the results for S. cerevisiae and S. pombe, respectively:

Number of phosphorylation sites vs. protein lengh for S. cerevisiae

Number of phosphorylation sites vs. protein length for S. pombe

As should be evident from the plots, the average number of phosphorylation sites in a protein correlates strongly with its length, which is by no means surprisings. It is unclear to me why the intercept with the y-axis appears to differ from zero in both plots; suggestions are welcome.

The next question was whether the Gene Ontology terms that correspond to proteins with many phosphorylation sites are indeed assigned to proteins that are longer than average. I thus examined the terms “Cell budding”, “Morphogenesis”, and “Signal transduction”.

The average S. cerevisiae protein is 450 aa long. Proteins annotated with “Cell budding”, “Morphogenesis”, and “Signal transduction” are on average 1.6 (739 aa), 2.1 (945 aa), and 1.5 (679 aa) times longer, respectively. By comparison, the corresponding ratios observed for phosphorylation sites are approximately 2.3, 2.6, and 2.4. It would thus appear that differences in protein length between functional classes of proteins account for much, but not all, of the signal that was observed by Beltrao et al. when comparing the number phosphorylation sites.

Edit: Make sure to read Pedro Beltrao’s follow-up blog post, which nicely confirms that whereas protein length does play a role, it is not the full story.

WebCiteCite this post

1 thought on “Analysis: On the evolution of protein length and phosphorylation sites

  1. Lars Juhl Jensen Post author

    Imported from FriendFeed:

    Khader Shameer, Neil Saunders and Deepak Singh liked this.

    Thanks for the analysis Lars. I had a look at issue you pointed out in what it relates to our study – Pedro Beltrao

    It would be cool to know how much of the correlation you mention might be due to MS bias. – Pedro Beltrao

    Thanks Pedro – redoing the same plot based on average number of phosphorylation sites per residue instead of on the raw counts was exactly what I wanted to do. It would just have been too much work to redo it without asking you for access to all your files. So I hoped that you would do it :) The result is as I expected; when I looked at protein lengths it did not explain the full signal, so there should be a correlation left after the length correction. – Lars Juhl Jensen

    This is great discussion. One day, online papers will be wiki documents and you’ll be able to edit one another’s figures :-) – Neil Saunders

    Intuitively, I would expect that due to biology the number of phosphorylation sites is directly proportional to the length of the protein (i.e. that the average density of phosphorylation sites is independent of protein length). When it comes to MS biases then I think that protein expression level is a much more important factor to take into account; a phosphorylation site on an abundant protein is much more likely to be picked up than one on a protein present in only one copy per cell. – Lars Juhl Jensen

    Come to think of it: if there is a group of fairly small, highly expressed proteins (ribosomal proteins?), then this would explain the negative intercept with the y-axis in my plots. (Edit: rethinking the argument, this would give a positive intercept, not a negative one.) – Lars Juhl Jensen

    Right; you can have small proteins with zero sites, but you can’t have zero-length proteins. – Neil Saunders


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s