Posts Tagged ‘proteomics’

Analysis: Limited agreement among lists of Cdc28p substrates

November 3, 2009

A collaboration between the Morgan lab at UCSF and the Gygi lab at Harvard has resulted in a paper by Holt et al. in Science, which reports the identification of several hundred substrates of the central cell-cycle kinase Cdc28p (also known as Cdk1) in the budding yeast Saccharomyces cerevisiae:

Global analysis of Cdk1 substrate phosphorylation sites provides insights into evolution.

To explore the mechanisms and evolution of cell-cycle control, we analyzed the position and conservation of large numbers of phosphorylation sites for the cyclin-dependent kinase Cdk1 in the budding yeast Saccharomyces cerevisiae. We combined specific chemical inhibition of Cdk1 with quantitative mass spectrometry to identify the positions of 547 phosphorylation sites on 308 Cdk1 substrates in vivo. Comparisons of these substrates with orthologs throughout the ascomycete lineage revealed that the position of most phosphorylation sites is not conserved in evolution; instead, clusters of sites shift position in rapidly evolving disordered regions. We propose that the regulation of protein function by phosphorylation often depends on simple nonspecific mechanisms that disrupt or enhance protein-protein interactions. The gain or loss of phosphorylation sites in rapidly evolving regions could facilitate the evolution of kinase-signaling circuits.

The paper makes several interested in analyses and observations. However, I found the comparison to the previous study of Cdc28p substrates by Ubersax et al. from the Morgan lab to be less detailed than I had hoped for:

Phosphorylation of Cdk1 consensus sites was observed on 67% (122 of 181) of proteins previously identified as Cdk1 substrates in vitro (4). Sixty-six percent (80 of 122) of these proteins contained sites at which phosphorylation decreased (log2 H/L < –1) after inhibition of Cdk1 (only 45 of 122 are expected if there is no correlation between the experiments in vitro and in vivo; χ2 test, P < 10-10).

In other words, 44% (80 of 181) of Cdc28p substrates identified in the old study were confirmed by the new study, and only 26% (80 of 308) of the Cdc28p substrates identified in the new study are supported by the old study. There are many possible explanations for this discrepancy

Depth of the mass spectrometry

It is notoriously difficult to identify peptides from low-abundance proteins in mass spectrometry. In the new mass spectrometry study, the authors were able to map 8710 precise phosphorylation sites on 1957 proteins. However, budding yeast is estimated to express in the order of 4500 distinct proteins during exponential growth (Gavin et al., 2006). Assuming that the majority of these proteins contain sites that are phosphorylated during at least part of the mitotic cell cycle, it is likely that a considerable number of low-abundance Cdc28p substrates identified in the old study have been missed in the new study.

Biases in phosphopeptide enrichment

When doing phosphoproteomics, it is necessary to first enrich for phosphopeptides to improve the coverage. To this end, Holt et al. used immobilized metal affinity chromatography (IMAC). In 2007, the Aebersold group at ETH published a paper showing that different purification methods lead to isolation of different, partially overlapping segments of the phosphoproteome. Specifically, they showed that IMAC enrichment biases the data towards isolation of multiply phosphorylated peptides. Given that only a single purification method was used, it is likely that in vivo Cdc28p substrates may have been missed in the new study, in particular if the peptides contain only a single phosphorylation site.

In vitro vs. in vivo conditions

The old study by Ubersax et al. was done performed on cell lysate, which is an in vitro strategy (although all other proteins expressed during the cell cycle are present). It is thus likely that some of the proteins that are phosphorylated by Cdc28p under these conditions are nonetheless not in vivo Cdc28p substrates.

Can we do better?

As always, it is easy to point out potential flaws in other people’s data sets; however, it is much more constructive to do something about the problems. The challenge is thus to construct a larger and more reliable set of Cdc28p substrates by combining the data from the two studies.

To check the feasibility of assigning confidence scores to different putative Cdc28p substrates, I tested if the fold change observed in the new study correlates with the chance that the substrate was also identified in the old study. To this end, I divided the 308 Cdc28p substrates from the new studies into two groups and constructed histograms of the fold changes for each group:

Phosphorylation ratios from Holt et al.

The fold changes are clearly skewed towards larger negative values for the Cdc28p substrates also identified by the old study relative to the proteins that were not previously identified as Cdc28p substrates. This difference is statistically significant at P < 1% according to the Kolmogorov-Smirnov test. This suggests that the observed fold changes in the new mass spectrometry study correlates with the likelihood that the proteins are true Cdc28p substrates.

The old study gave rise to so-called P-score for the individual proteins (not to be confused with P-values). I decided to test if these too can be used as quality scores, I constructed an equivalent histogram in which the Cdc28p substrates found in the old study were divided into two groups based on whether or not they were also found in the new study:

P-scores from Ubersax et al.

In this case, no obvious trend is seen and a Kolmogorov-Smirnov test indeed reveals no statistically significant difference between the two distributions. Surprisingly, the P-scores do thus not appear to be useful quality scores for the putative Cdc28p substrates.

Given the two sets of putative Cdc28 substrates, only one of which can be ranked by reliability, how can we create a better combined set? If one aims for the high accuracy at the price of low coverage, one could obviously choose to trust only the substrates identified by both screens. However, given the caveats regarding depth of mass spectrometry and biases arising from the enrichment procedure, I would be hesitant to use this approach. Alternatively, one could aim for maximal coverage at the price of accuracy by trusting all sites identified by either study. However, seeing the large fraction of novel substrates identified by Holt et al. with a log2-ratio only slightly below -1, I would personally tend to apply a more stringent threshold to the data from the new study by Holt et al., for example requiring log2-ratio below -2, before merging the sets of substrates from the two studies.

WebCiteCite this post

Analysis: On the evolution of protein length and phosphorylation sites

June 25, 2009

It has been much too long since I have last written a blog post. Part of the reason has been that I have been busy moving back to Denmark, starting up a research group, and co-founding a company. More on that in other blog posts. The main reason, however, has been a lack of papers that inspired me to do the simple follow-up analyses that I usually blog about.

This has thankfully changed now. Pedro Beltrao and coworkers recently published an interesting paper in PLoS Biology on the evolution of regulation through protein phosphorylation. The paper presents several interesting analyses and comparisoins of phosphoproteomics data from three yeast species; the abstract summarizes the findings better than I can do:

Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species
The extent by which different cellular components generate phenotypic diversity is an ongoing debate in evolutionary biology that is yet to be addressed by quantitative comparative studies. We conducted an in vivo mass-spectrometry study of the phosphoproteomes of three yeast species (Saccharomyces cerevisiae, Candida albicans, and Schizosaccharomyces pombe) in order to quantify the evolutionary rate of change of phosphorylation. We estimate that kinase–substrate interactions change, at most, two orders of magnitude more slowly than transcription factor (TF)–promoter interactions. Our computational analysis linking kinases to putative substrates recapitulates known phosphoregulation events and provides putative evolutionary histories for the kinase regulation of protein complexes across 11 yeast species. To validate these trends, we used the E-MAP approach to analyze over 2,000 quantitative genetic interactions in S. cerevisiae and Sc. pombe, which demonstrated that protein kinases, and to a greater extent TFs, show lower than average conservation of genetic interactions. We propose therefore that protein kinases are an important source of phenotypic diversity.

Figure 1a in the paper shows the intriguing observation that, despite rapid evolution of individual phosphorylation sites, the relative number of phosphorylation sites within proteins from different functional classes (Gene Ontology categories) remains remarkably constant between species:

Beltrao et al., PLoS Biology, 2009, Figure 1a

However, it occurred to me that this could potentially be a consequence of longer proteins having more phosphorylation sites, and protein length being conserved through evolution. I thus counted the number of unique phosphorylation sites identified in each protein (thanks to Pedro Beltrao for providing the data) and correlated it with the length of the proteins. In the two plots below, I have pooled the proteins so that each dot corresponds to 100 proteins. The upper and lower panels show the results for S. cerevisiae and S. pombe, respectively:

Number of phosphorylation sites vs. protein lengh for S. cerevisiae

Number of phosphorylation sites vs. protein length for S. pombe

As should be evident from the plots, the average number of phosphorylation sites in a protein correlates strongly with its length, which is by no means surprisings. It is unclear to me why the intercept with the y-axis appears to differ from zero in both plots; suggestions are welcome.

The next question was whether the Gene Ontology terms that correspond to proteins with many phosphorylation sites are indeed assigned to proteins that are longer than average. I thus examined the terms “Cell budding”, “Morphogenesis”, and “Signal transduction”.

The average S. cerevisiae protein is 450 aa long. Proteins annotated with “Cell budding”, “Morphogenesis”, and “Signal transduction” are on average 1.6 (739 aa), 2.1 (945 aa), and 1.5 (679 aa) times longer, respectively. By comparison, the corresponding ratios observed for phosphorylation sites are approximately 2.3, 2.6, and 2.4. It would thus appear that differences in protein length between functional classes of proteins account for much, but not all, of the signal that was observed by Beltrao et al. when comparing the number phosphorylation sites.

Edit: Make sure to read Pedro Beltrao’s follow-up blog post, which nicely confirms that whereas protein length does play a role, it is not the full story.

WebCiteCite this post

Analysis: Cell-cycle-regulated proteins are more abundant in haploid relative to diploid cells

September 30, 2008

Two days ago, Matthias Mann’s group published a paper in Nature in which they compare the level of individual proteins in haploid relative to diploid budding yeast cells:

Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast

Mass spectrometry is a powerful technology for the analysis of large numbers of endogenous proteins. However, the analytical challenges associated with comprehensive identification and relative quantification of cellular proteomes have so far appeared to be insurmountable. Here, using advances in computational proteomics, instrument performance and sample preparation strategies, we compare protein levels of essentially all endogenous proteins in haploid yeast cells to their diploid counterparts. Our analysis spans more than four orders of magnitude in protein abundance with no discrimination against membrane or low level regulatory proteins. Stable-isotope labelling by amino acids in cell culture (SILAC) quantification was very accurate across the proteome, as demonstrated by one-to-one ratios of most yeast proteins. Key members of the pheromone pathway were specific to haploid yeast but others were unaltered, suggesting an efficient control mechanism of the mating response. Several retrotransposon-associated proteins were specific to haploid yeast. Gene ontology analysis pinpointed a significant change for cell wall components in agreement with geometrical considerations: diploid cells have twice the volume but not twice the surface area of haploid cells. Transcriptome levels agreed poorly with proteome changes overall. However, after filtering out low confidence microarray measurements, messenger RNA changes and SILAC ratios correlated very well for pheromone pathway components. Systems-wide, precise quantification directly at the protein level opens up new perspectives in post-genomics and systems biology.

Although the paper focuses on the larger amount of cell-wall proteins and proteins involved in pheromone response in haploid cells, the supplementary tables reveal similar biases for many other functional classes, including nucleosomes and cyclin-dependent kinase inhibitors. As many of these proteins are regulated during the cell cycle, I suspected that cell-cycle-regulated proteins might be more abundant in haploid cells relative to diploid cells.

To test this hypothesis, I divided the proteins quantified by the Mann group into two classes: dynamic proteins, which are encoded by genes that are periodically expressed during the cell cycle, and static proteins, which are encoded by genes that are expressed at a constant level (de Lichtenberg et al., 2005). For each class, I plotted the log2-ratios of the protein levels in haploid and diploid cells:

The plot reeals a quite strong shift of dynamic proteins toward higher log-ratios; this difference is highly significant according to the Mann-Whitney U test (P < 10-12). Proteins encoded by cell-cycle-regulated genes are thus in general more abundant in haploid budding yeast cells than in diploid cells.

Full disclosure: I currently collaborate with Matthias Mann and members of his group, and we will soon be colleagues a the Novo Nordisk Foundation Center for Protein Research.

WebCiteCite this post