Tag Archives: phosphorylation

Announcement: PTMs in Cell Signaling conference

Two years ago, I was one of the organizers of the 2nd Copenhagen Bioscience Conference entitled PTMs in Cell Signaling. I think it is fair to describe it as a highly successful meeting, and it is my great pleasure to announce that we will be organizing a second meeting on the topic September 14-18, 2014.

CBC6 poster

My co-chairs Jeremy Austin Daniel, Michael Lund Nielsen, and Amilcar Flores Morales have managed to put together the following excellent lineup of invited speakers:

Alfonso Valencia, Chris Sander, David Komander, Gary Nolan, Genevieve Almouzni, Guillermo Montoya, Hanno Steen, Henrik Daub, John Blenis, John Diffley, John Tainer, Karolin Luger, Marcus Bantscheff, Margaret Goodell, Matthias Mann, Michael Yaffe, Natalie Ahn, Pedro Beltrao, Stephen Elledge, Tanya Paull, Tony Hunter, Yang Shi, Yehudit Bergman, and Yosef Shiloh.

All conference expenses are covered, which means that there will be no registration fee and no expenses for accommodation or food. You will have to cover your own travel expenses, though.

Participants will be selected based on abstract submission, which is open until June 9, 2014. For more information please see the conference website.

Announcement: Computational analysis of protein-protein interactions for bench biologists

Once again I will be one of the teachers on an EMBO Practical Course. This time we will be teaching wet-lab biologists about how to do computational analysis of protein-protein interactions. The course will take place September 2-8 at the Max Delbrück Center for Molecular Medicine in Berlin, Germany.

The course aims to help bench scientists become more effective at exploiting the wide range of commonly-used databases and bioinformatics tools that can be used to identify, understand, and predict protein interactions by analyzing their structure, sequences, and other features.

The target group for the course are experimental scientists needing to analyse interaction data in their work, and who have limited experience using bioinformatics tools and resources. The course covers analyses and tools that are applied after potential interactions have been identified. It does not cover analysis of the raw data from, for example, mass spectrometry.

To apply for the course, fill in the online application form. The registration deadline is Friday June 15th 2012. The course fee is 200 euros for academics and 1000 euros for scientists from industry.

Analysis: Limited agreement among lists of Cdc28p substrates

A collaboration between the Morgan lab at UCSF and the Gygi lab at Harvard has resulted in a paper by Holt et al. in Science, which reports the identification of several hundred substrates of the central cell-cycle kinase Cdc28p (also known as Cdk1) in the budding yeast Saccharomyces cerevisiae:

Global analysis of Cdk1 substrate phosphorylation sites provides insights into evolution.

To explore the mechanisms and evolution of cell-cycle control, we analyzed the position and conservation of large numbers of phosphorylation sites for the cyclin-dependent kinase Cdk1 in the budding yeast Saccharomyces cerevisiae. We combined specific chemical inhibition of Cdk1 with quantitative mass spectrometry to identify the positions of 547 phosphorylation sites on 308 Cdk1 substrates in vivo. Comparisons of these substrates with orthologs throughout the ascomycete lineage revealed that the position of most phosphorylation sites is not conserved in evolution; instead, clusters of sites shift position in rapidly evolving disordered regions. We propose that the regulation of protein function by phosphorylation often depends on simple nonspecific mechanisms that disrupt or enhance protein-protein interactions. The gain or loss of phosphorylation sites in rapidly evolving regions could facilitate the evolution of kinase-signaling circuits.

The paper makes several interested in analyses and observations. However, I found the comparison to the previous study of Cdc28p substrates by Ubersax et al. from the Morgan lab to be less detailed than I had hoped for:

Phosphorylation of Cdk1 consensus sites was observed on 67% (122 of 181) of proteins previously identified as Cdk1 substrates in vitro (4). Sixty-six percent (80 of 122) of these proteins contained sites at which phosphorylation decreased (log2 H/L < –1) after inhibition of Cdk1 (only 45 of 122 are expected if there is no correlation between the experiments in vitro and in vivo; χ2 test, P < 10-10).

In other words, 44% (80 of 181) of Cdc28p substrates identified in the old study were confirmed by the new study, and only 26% (80 of 308) of the Cdc28p substrates identified in the new study are supported by the old study. There are many possible explanations for this discrepancy

Depth of the mass spectrometry

It is notoriously difficult to identify peptides from low-abundance proteins in mass spectrometry. In the new mass spectrometry study, the authors were able to map 8710 precise phosphorylation sites on 1957 proteins. However, budding yeast is estimated to express in the order of 4500 distinct proteins during exponential growth (Gavin et al., 2006). Assuming that the majority of these proteins contain sites that are phosphorylated during at least part of the mitotic cell cycle, it is likely that a considerable number of low-abundance Cdc28p substrates identified in the old study have been missed in the new study.

Biases in phosphopeptide enrichment

When doing phosphoproteomics, it is necessary to first enrich for phosphopeptides to improve the coverage. To this end, Holt et al. used immobilized metal affinity chromatography (IMAC). In 2007, the Aebersold group at ETH published a paper showing that different purification methods lead to isolation of different, partially overlapping segments of the phosphoproteome. Specifically, they showed that IMAC enrichment biases the data towards isolation of multiply phosphorylated peptides. Given that only a single purification method was used, it is likely that in vivo Cdc28p substrates may have been missed in the new study, in particular if the peptides contain only a single phosphorylation site.

In vitro vs. in vivo conditions

The old study by Ubersax et al. was done performed on cell lysate, which is an in vitro strategy (although all other proteins expressed during the cell cycle are present). It is thus likely that some of the proteins that are phosphorylated by Cdc28p under these conditions are nonetheless not in vivo Cdc28p substrates.

Can we do better?

As always, it is easy to point out potential flaws in other people’s data sets; however, it is much more constructive to do something about the problems. The challenge is thus to construct a larger and more reliable set of Cdc28p substrates by combining the data from the two studies.

To check the feasibility of assigning confidence scores to different putative Cdc28p substrates, I tested if the fold change observed in the new study correlates with the chance that the substrate was also identified in the old study. To this end, I divided the 308 Cdc28p substrates from the new studies into two groups and constructed histograms of the fold changes for each group:

Phosphorylation ratios from Holt et al.

The fold changes are clearly skewed towards larger negative values for the Cdc28p substrates also identified by the old study relative to the proteins that were not previously identified as Cdc28p substrates. This difference is statistically significant at P < 1% according to the Kolmogorov-Smirnov test. This suggests that the observed fold changes in the new mass spectrometry study correlates with the likelihood that the proteins are true Cdc28p substrates.

The old study gave rise to so-called P-score for the individual proteins (not to be confused with P-values). I decided to test if these too can be used as quality scores, I constructed an equivalent histogram in which the Cdc28p substrates found in the old study were divided into two groups based on whether or not they were also found in the new study:

P-scores from Ubersax et al.

In this case, no obvious trend is seen and a Kolmogorov-Smirnov test indeed reveals no statistically significant difference between the two distributions. Surprisingly, the P-scores do thus not appear to be useful quality scores for the putative Cdc28p substrates.

Given the two sets of putative Cdc28 substrates, only one of which can be ranked by reliability, how can we create a better combined set? If one aims for the high accuracy at the price of low coverage, one could obviously choose to trust only the substrates identified by both screens. However, given the caveats regarding depth of mass spectrometry and biases arising from the enrichment procedure, I would be hesitant to use this approach. Alternatively, one could aim for maximal coverage at the price of accuracy by trusting all sites identified by either study. However, seeing the large fraction of novel substrates identified by Holt et al. with a log2-ratio only slightly below -1, I would personally tend to apply a more stringent threshold to the data from the new study by Holt et al., for example requiring log2-ratio below -2, before merging the sets of substrates from the two studies.

WebCiteCite this post

Analysis: On the evolution of protein length and phosphorylation sites

It has been much too long since I have last written a blog post. Part of the reason has been that I have been busy moving back to Denmark, starting up a research group, and co-founding a company. More on that in other blog posts. The main reason, however, has been a lack of papers that inspired me to do the simple follow-up analyses that I usually blog about.

This has thankfully changed now. Pedro Beltrao and coworkers recently published an interesting paper in PLoS Biology on the evolution of regulation through protein phosphorylation. The paper presents several interesting analyses and comparisoins of phosphoproteomics data from three yeast species; the abstract summarizes the findings better than I can do:

Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species
The extent by which different cellular components generate phenotypic diversity is an ongoing debate in evolutionary biology that is yet to be addressed by quantitative comparative studies. We conducted an in vivo mass-spectrometry study of the phosphoproteomes of three yeast species (Saccharomyces cerevisiae, Candida albicans, and Schizosaccharomyces pombe) in order to quantify the evolutionary rate of change of phosphorylation. We estimate that kinase–substrate interactions change, at most, two orders of magnitude more slowly than transcription factor (TF)–promoter interactions. Our computational analysis linking kinases to putative substrates recapitulates known phosphoregulation events and provides putative evolutionary histories for the kinase regulation of protein complexes across 11 yeast species. To validate these trends, we used the E-MAP approach to analyze over 2,000 quantitative genetic interactions in S. cerevisiae and Sc. pombe, which demonstrated that protein kinases, and to a greater extent TFs, show lower than average conservation of genetic interactions. We propose therefore that protein kinases are an important source of phenotypic diversity.

Figure 1a in the paper shows the intriguing observation that, despite rapid evolution of individual phosphorylation sites, the relative number of phosphorylation sites within proteins from different functional classes (Gene Ontology categories) remains remarkably constant between species:

Beltrao et al., PLoS Biology, 2009, Figure 1a

However, it occurred to me that this could potentially be a consequence of longer proteins having more phosphorylation sites, and protein length being conserved through evolution. I thus counted the number of unique phosphorylation sites identified in each protein (thanks to Pedro Beltrao for providing the data) and correlated it with the length of the proteins. In the two plots below, I have pooled the proteins so that each dot corresponds to 100 proteins. The upper and lower panels show the results for S. cerevisiae and S. pombe, respectively:

Number of phosphorylation sites vs. protein lengh for S. cerevisiae

Number of phosphorylation sites vs. protein length for S. pombe

As should be evident from the plots, the average number of phosphorylation sites in a protein correlates strongly with its length, which is by no means surprisings. It is unclear to me why the intercept with the y-axis appears to differ from zero in both plots; suggestions are welcome.

The next question was whether the Gene Ontology terms that correspond to proteins with many phosphorylation sites are indeed assigned to proteins that are longer than average. I thus examined the terms “Cell budding”, “Morphogenesis”, and “Signal transduction”.

The average S. cerevisiae protein is 450 aa long. Proteins annotated with “Cell budding”, “Morphogenesis”, and “Signal transduction” are on average 1.6 (739 aa), 2.1 (945 aa), and 1.5 (679 aa) times longer, respectively. By comparison, the corresponding ratios observed for phosphorylation sites are approximately 2.3, 2.6, and 2.4. It would thus appear that differences in protein length between functional classes of proteins account for much, but not all, of the signal that was observed by Beltrao et al. when comparing the number phosphorylation sites.

Edit: Make sure to read Pedro Beltrao’s follow-up blog post, which nicely confirms that whereas protein length does play a role, it is not the full story.

WebCiteCite this post

Analysis: Transcriptional and posttranslational regulation of cell-cycle kinases

Daub and coworkers from Matthias Mann’s group recently published a paper in Molecular Cell, describing a phosphoproteomics study of kinases during S and M phase of the mitotic cell cycle:

Kinase-selective enrichment enables quantitative phosphoproteomics of the kinome across the cell cycle.

Protein kinases are pivotal regulators of cell signaling that modulate each other’s functions and activities through site-specific phosphorylation events. These key regulatory modifications have not been studied comprehensively, because low cellular abundance of kinases has resulted in their underrepresentation in previous phosphoproteome studies. Here, we combine kinase-selective affinity purification with quantitative mass spectrometry to analyze the cell-cycle regulation of protein kinases. This proteomics approach enabled us to quantify 219 protein kinases from S and M phase-arrested human cancer cells. We identified more than 1000 phosphorylation sites on protein kinases. Intriguingly, half of all kinase phosphopeptides were upregulated in mitosis. Our data reveal numerous unknown M phase-induced phosphorylation sites on kinases with established mitotic functions. We also find potential phosphorylation networks involving many protein kinases not previously implicated in mitotic progression. These results provide a vastly extended knowledge base for functional studies on kinases and their regulation through site-specific phosphorylation.

In the study, they identified phosphorylation sites for 219 protein kinases, of which 159 showed differential phosphorylation (at least two-fold induction for at least one site) in S and/or M phase.

My collaborators at CBS and I have previously shown that transcriptional and posttranslational regulation (for example, phosphorylation by cyclin-dependent kinases) tend to target the same proteins (de Lichtenberg et al., 2005; Jensen et al., 2006). One should thus expect that the differentially regulated kinases have a tendency to be encoded by periodically expressed genes.

To test this hypothesis, I compared the phosphoproteomics data of Daub et al. to the cell-cycle microarray expression study by Whitfield et al. (2002). I was able to map 132 of the 159 kinases to the microarrays and found that 17 of them are encoded by the top-600 cycling genes. This corresponds to a significant (P < 0.001) two-fold overrepresentation of transcriptional cell-cycle regulation among the genes encoding kinases that are differentially phosphorylated during S and/or M phase.

One could imagine that this trend is not specific to kinases that are differentially phosphorylated during the cell cycle, but that it instead applies to kinases in general. To test this, I also mapped the 60 non-modulated kinases found by Daub et al. to the microarrays (Whitfield et al., 2002). Of the 54 kinases that could be mapped, only 3 are encoded by periodically expressed genes, which is almost exactly what is expected by random chance.

I next examined if timing of phosphorylation correlates with the timing of expression of the 17 kinases mentioned above. The kinases can be divided into three classes: phosphorylated in S phase, phosphorylated in M phase, and phosphorylated in both S and M phase. Notably, 13 of the 17 kinases fall in to the M phase class. Looking at the peak times of expression for these (that is when in the cell-cycle the corresponding mRNAs are most highly expressed) reveals that 8 of the 13 kinases are presumably synthesized in M phase only shortly before they become phosphorylated.

In summary, comparison of the phosphoproteomics data from Daub et al. (2008) and the microarray expression data from Whitfield et al. (2002) supports the view that transcriptional and posttranslational regulation tend to target the same proteins during the mitotic cell cycle. Moreover, it shows that for most of the kinases that are subject to such dual cell-cycle control, both expression and phosphorylation takes place during M phase when the cyclin-dependent kinase activity is maximal.

Full disclosure: I currently collaborate with Matthias Mann and members of his group, and we will soon be colleagues a the Novo Nordisk Foundation Center for Protein Research.

WebCiteCite this post

Analysis: The budding yeast phosphoproteome

The group of Donald F. Hunt at University of Virginia has recently published a paper in PNAS that describes a new phosphoproteomics study of budding yeast:

Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry

We present a strategy for the analysis of the yeast phosphoproteome that uses endo-Lys C as the proteolytic enzyme, immobilized metal affinity chromatography for phosphopeptide enrichment, a 90-min nanoflow-HPLC/electrospray-ionization MS/MS experiment for phosphopeptide fractionation and detection, gas phase ion/ion chemistry, electron transfer dissociation for peptide fragmentation, and the Open Mass Spectrometry Search Algorithm for phosphoprotein identification and assignment of phosphorylation sites. From a 30-microg (approximately 600 pmol) sample of total yeast protein, we identify 1,252 phosphorylation sites on 629 proteins. Identified phosphoproteins have expression levels that range from <50 to 1,200,000 copies per cell and are encoded by genes involved in a wide variety of cellular processes. We identify a consensus site that likely represents a motif for one or more uncharacterized kinases and show that yeast kinases, themselves, contain a disproportionately large number of phosphorylation sites. Detection of a pHis containing peptide from the yeast protein, Cdc10, suggests an unexpected role for histidine phosphorylation in septin biology. From diverse functional genomics data, we show that phosphoproteins have a higher number of interactions than an average protein and interact with each other more than with a random protein. They are also likely to be conserved across large evolutionary distances.

As is so often the case with experimental papers, no comparison is provided to earlier studies. I thus decided to compare the set of phosphoproteins identified by Hunt and coworkers to the set of Cdc28p substrates identified in two studies by the Morgan lab as well as to the proteome-wide, sequence-based predictions made by NetPhosK:

Venn diagram comparing three sets of phosphoproteins from budding yeast

The Venn diagram obviously shows that each of the three sets contains a considerable number of phosphoproteins that are not present in any of the other sets. This was to be expected since the three methods are fundamentally very different. The dataset from the Hunt lab includes proteins that are phosphorylated by other kinases than Cdc28p; however, it is limited in the sense that low-abundance phosphopeptides are typically missed by MS studies. Conversely, the set from the Morgan lab consists only of Cdc28p substrates, but is likely to have much better coverage of low-abundance phosphoproteins. Finally, the set of Cdc28 substrates from NetPhosK is likely to contain a considerable number of false positives as they are predicted from the protein sequence alone.

As a matter of fact, I find the overlap between the three sets to be surprisingly good. Even if we assume that the dataset from the Morgan lab contains no false positives, the overlap suggests that the new dataset from Hunt and coworkers captures one third of all phosphoproteins in budding yeast; assuming errors in both datasets increases this estimate. It is also noteworthy that NetPhosK misses only 22% of the Cdc28p that were identified by the Morgan lab and supported by the new data from the Hunt lab, although this high coverage is probably obtained at the price of many false positive predictions.

WebCiteCite this post