Tag Archives: regulation

Announcement: PTMs In Cell Signaling

It is my great pleasure to announce the 2nd Copenhagen Bioscience conference “PTMs In Cell Signaling”, which will take place in Helsingør, Denmark on December 3-5, 2012.

The conference will feature a truly excellent lineup of speakers: Philippe Bastiaens, Søren Brunak, Ivan Dikic, Gerald Hart, Tim Hunt, Steve Jackson, Doug Lauffenburger, Jiri Lukas, Matthias Mann, Andre Nussenzweig, Brenda Schulman, Henrik Semb, Eric Verdin, Forest White, Michael Yaffe, and Juleen Zierath.

The conferences is limited to 220 participants. It is fully sponsored by the Novo Nordisk Foundation who covers the conference fee, hotel, transport and meals during the conference. Participants cover their own travel expenses.

To find out more, please check the conference web site.

Analysis: On the evolution of protein length and phosphorylation sites

It has been much too long since I have last written a blog post. Part of the reason has been that I have been busy moving back to Denmark, starting up a research group, and co-founding a company. More on that in other blog posts. The main reason, however, has been a lack of papers that inspired me to do the simple follow-up analyses that I usually blog about.

This has thankfully changed now. Pedro Beltrao and coworkers recently published an interesting paper in PLoS Biology on the evolution of regulation through protein phosphorylation. The paper presents several interesting analyses and comparisoins of phosphoproteomics data from three yeast species; the abstract summarizes the findings better than I can do:

Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species
The extent by which different cellular components generate phenotypic diversity is an ongoing debate in evolutionary biology that is yet to be addressed by quantitative comparative studies. We conducted an in vivo mass-spectrometry study of the phosphoproteomes of three yeast species (Saccharomyces cerevisiae, Candida albicans, and Schizosaccharomyces pombe) in order to quantify the evolutionary rate of change of phosphorylation. We estimate that kinase–substrate interactions change, at most, two orders of magnitude more slowly than transcription factor (TF)–promoter interactions. Our computational analysis linking kinases to putative substrates recapitulates known phosphoregulation events and provides putative evolutionary histories for the kinase regulation of protein complexes across 11 yeast species. To validate these trends, we used the E-MAP approach to analyze over 2,000 quantitative genetic interactions in S. cerevisiae and Sc. pombe, which demonstrated that protein kinases, and to a greater extent TFs, show lower than average conservation of genetic interactions. We propose therefore that protein kinases are an important source of phenotypic diversity.

Figure 1a in the paper shows the intriguing observation that, despite rapid evolution of individual phosphorylation sites, the relative number of phosphorylation sites within proteins from different functional classes (Gene Ontology categories) remains remarkably constant between species:

Beltrao et al., PLoS Biology, 2009, Figure 1a

However, it occurred to me that this could potentially be a consequence of longer proteins having more phosphorylation sites, and protein length being conserved through evolution. I thus counted the number of unique phosphorylation sites identified in each protein (thanks to Pedro Beltrao for providing the data) and correlated it with the length of the proteins. In the two plots below, I have pooled the proteins so that each dot corresponds to 100 proteins. The upper and lower panels show the results for S. cerevisiae and S. pombe, respectively:

Number of phosphorylation sites vs. protein lengh for S. cerevisiae

Number of phosphorylation sites vs. protein length for S. pombe

As should be evident from the plots, the average number of phosphorylation sites in a protein correlates strongly with its length, which is by no means surprisings. It is unclear to me why the intercept with the y-axis appears to differ from zero in both plots; suggestions are welcome.

The next question was whether the Gene Ontology terms that correspond to proteins with many phosphorylation sites are indeed assigned to proteins that are longer than average. I thus examined the terms “Cell budding”, “Morphogenesis”, and “Signal transduction”.

The average S. cerevisiae protein is 450 aa long. Proteins annotated with “Cell budding”, “Morphogenesis”, and “Signal transduction” are on average 1.6 (739 aa), 2.1 (945 aa), and 1.5 (679 aa) times longer, respectively. By comparison, the corresponding ratios observed for phosphorylation sites are approximately 2.3, 2.6, and 2.4. It would thus appear that differences in protein length between functional classes of proteins account for much, but not all, of the signal that was observed by Beltrao et al. when comparing the number phosphorylation sites.

Edit: Make sure to read Pedro Beltrao’s follow-up blog post, which nicely confirms that whereas protein length does play a role, it is not the full story.

WebCiteCite this post

Analysis: Cell-cycle-regulated proteins are more abundant in haploid relative to diploid cells

Two days ago, Matthias Mann’s group published a paper in Nature in which they compare the level of individual proteins in haploid relative to diploid budding yeast cells:

Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast

Mass spectrometry is a powerful technology for the analysis of large numbers of endogenous proteins. However, the analytical challenges associated with comprehensive identification and relative quantification of cellular proteomes have so far appeared to be insurmountable. Here, using advances in computational proteomics, instrument performance and sample preparation strategies, we compare protein levels of essentially all endogenous proteins in haploid yeast cells to their diploid counterparts. Our analysis spans more than four orders of magnitude in protein abundance with no discrimination against membrane or low level regulatory proteins. Stable-isotope labelling by amino acids in cell culture (SILAC) quantification was very accurate across the proteome, as demonstrated by one-to-one ratios of most yeast proteins. Key members of the pheromone pathway were specific to haploid yeast but others were unaltered, suggesting an efficient control mechanism of the mating response. Several retrotransposon-associated proteins were specific to haploid yeast. Gene ontology analysis pinpointed a significant change for cell wall components in agreement with geometrical considerations: diploid cells have twice the volume but not twice the surface area of haploid cells. Transcriptome levels agreed poorly with proteome changes overall. However, after filtering out low confidence microarray measurements, messenger RNA changes and SILAC ratios correlated very well for pheromone pathway components. Systems-wide, precise quantification directly at the protein level opens up new perspectives in post-genomics and systems biology.

Although the paper focuses on the larger amount of cell-wall proteins and proteins involved in pheromone response in haploid cells, the supplementary tables reveal similar biases for many other functional classes, including nucleosomes and cyclin-dependent kinase inhibitors. As many of these proteins are regulated during the cell cycle, I suspected that cell-cycle-regulated proteins might be more abundant in haploid cells relative to diploid cells.

To test this hypothesis, I divided the proteins quantified by the Mann group into two classes: dynamic proteins, which are encoded by genes that are periodically expressed during the cell cycle, and static proteins, which are encoded by genes that are expressed at a constant level (de Lichtenberg et al., 2005). For each class, I plotted the log2-ratios of the protein levels in haploid and diploid cells:

The plot reeals a quite strong shift of dynamic proteins toward higher log-ratios; this difference is highly significant according to the Mann-Whitney U test (P < 10-12). Proteins encoded by cell-cycle-regulated genes are thus in general more abundant in haploid budding yeast cells than in diploid cells.

Full disclosure: I currently collaborate with Matthias Mann and members of his group, and we will soon be colleagues a the Novo Nordisk Foundation Center for Protein Research.

WebCiteCite this post

Analysis: Degradation signals correlate with protein half-life

I yesterday blogged about how the protein half-life data from the O’Shea lab fit well with my earlier analyses of transcriptional regulation during the budding yeast cell cycle and with the just-in-time assembly hypothesis. However, I have now realized that the same data set can be used to test the validity of the sequence-based predictions of protein degradation signals that I relied on for the cell-cycle study.

To this end, I divided the budding yeast proteome into six groups: proteins with a D-box, proteins without a D-box, proteins with a KEN-box, proteins without a KEN-box, proteins with a PEST region, and proteins without a PEST region. For each of these six groups of proteins, I simply plotted the distribution of protein half-lives as a histogram:

The figure shows that for all three degradation signals, proteins with the sequence motif tend to have shorter half-lives than proteins without the motif. These differences are all statistically significant according to the Mann-Whitney U test (D-box, P < 10-6; KEN-box, P < 0.02; PEST region, P < 10-15). It is noteworthy that the KEN-box motif gives a far weaker correlation with protein half-live than the two other degradation signals, as it was also the only degradation signal that did not correlate with transcriptional cell-cycle regulation in budding yeast (see supplementary information of Jensen et al., 2006).

In summary, proteins that contain putative degradation signals have significantly shorter half-lives than proteins that do not contain such signals. The only caveat is that long sequences are more likely to match the sequence motifs, and that O’Shea and colleagues found a negative correlation between sequence length and protein half-life. The correlations described here could thus be a secondary effect; however, it is also possible that the presence of degradation signals in long sequences is the missing explanation for their short half-lives.

WebCiteCite this post

Analysis: Cell-cycle-regulated genes encode short-lived proteins

In relation to an entirely different analysis than the one I will describe here, I downloaded the protein half-life data for budding yeast that was published in PNAS by the O’Shea lab about two years ago:

Quantification of protein half-lives in the budding yeast proteome

A complete description of protein metabolism requires knowledge of the rates of protein production and destruction within cells. Using an epitope-tagged strain collection, we measured the half-life of >3,750 proteins in the yeast proteome after inhibition of translation. By integrating our data with previous measurements of protein and mRNA abundance and translation rate, we provide evidence that many proteins partition into one of two regimes for protein metabolism: one optimized for efficient production or a second optimized for regulatory efficiency. Incorporation of protein half-life information into a simple quantitative model for protein production improves our ability to predict steady-state protein abundance values. Analysis of a simple dynamic protein production model reveals a remarkable correlation between transcriptional regulation and protein half-life within some groups of coregulated genes, suggesting that cells coordinate these two processes to achieve uniform effects on protein abundances. Our experimental data and theoretical analysis underscore the importance of an integrative approach to the complex interplay between protein degradation, transcriptional regulation, and other determinants of protein metabolism.

The idea that transcriptional regulation goes hand-in-hand with protein degradation is fully consistent with the just-in-time assembly hypothesis. I thus examined the distributions of protein half-lives for dynamic (i.e. periodically expressed) and static (i.e. not periodically expressed) proteins:

The histogram suggests that dynamic proteins are shifted towards shorter half-lives relative to static proteins. The difference is indeed statistically significant according to the Mann-Whitney U test (P < 10-4). This result supports the sequence-based observation that dynamic proteins contain more D-box, KEN-box, and PEST degradation signals than static proteins.

I next tested if the half-life of the dynamic proteins varies during the cell cycle by make scatter plot of the protein half-life as function of the time of peak expression for the corresponding mRNA:

There appears to be no correlation. Together, these analyses indicate that dynamic proteins have shorter half-lives than static proteins, irrespective of when in the cell cycle they are expressed.

WebCiteCite this post

Analysis: A democratic approach to identification of cell-cycle-regulated genes

Over the years several microarray time-course experiments have been performed to identify the genes that are transcriptionally regulated during the mitotic cell cycle, i.e the periodically expressed genes. Moreover, bioinformaticians have developed many different computational methods for identifying the periodically expressed genes from microarray time-course data.

Below is a list of the experimental and computational analyses of the budding yeast cell cycle that I am aware of (please notify me if you know of other microarray experiments or computational methods):

  1. Cho et al., Mol. Cell, 1998
  2. Spellman et al., Mol. Biol. Cell, 1998
  3. Zhao et al., Proc. Natl. Acad. Sci. USA, 2001
  4. Langmead et al., Proc. IEEE Comput. Soc. Bioinformatics Conf., 2002
  5. Langmead et al.,RECOMB, 2002
  6. Langmead et al., J. Comput. Biol., 2003
  7. de Lichtenberg et al., J. Mol. Biol., 2003
  8. Johansson et al., Bioinformatics, 2003
  9. Wichert et al., Bioinformatics, 2004
  10. Lu et al., Nucleic Acids Res., 2004
  11. Luan and Li, Bioinformatics, 2004
  12. de Lichtenberg et al., Bioinformatics, 2005
  13. de Lichtenberg et al., Yeast, 2005
  14. Willbrand et al., Bioinformatics, 2005
  15. Ahdesmäki et al., BMC Bioinformatics, 2005
  16. Chen, BMC Bioinformatics, 2005
  17. Qiu et al., Conf. Proc. IEEE Eng. Med. Biol. Soc., 2005
  18. Qiu et al., Bioinformatics, 2006
  19. Andersson et al., BMC Bioinformatics, 2006
  20. Gan et al., Int. Conf. Pattern Recog., 2006
  21. Glynn et al., Bioinformatics, 2006
  22. Ahnert et al., Bioinformatics, 2006
  23. Lu et al., Bioinformatics, 2006
  24. Xu et al., LSS Comput. Syst. Bioinformatics Conf., 2006
  25. Pramilla et al., Genes Dev., 2006
  26. Liew et al, BMC Bioinformatics, 2007
  27. Lu et al., Genome Biol., 2007
  28. Morton et al., Stat. Appl. Genet. Mol. Biol., 2007
  29. Rowicka et al., Proc. Natl. Acad. Sci. USA, 2007
  30. Gauthier et al., Nucleic Acids Res., 2008
  31. Orlando et al., Nature, 2008

These studies have reported a mixture of ranked and unranked lists of periodically expressed genes. By that I mean that some studies provided a list of genes sorted according to how periodic the expression profiles appear, whereas others simply provide a list of the genes deemed periodic. For the ranked lists, I first checked the publications to see if the authors suggested a cutoff for the number of periodically expressed genes, in which case I followed their recommendations. If the authors suggested multiple lists of varying confidence, I used the highest-confidence list. If no cutoff was proposed, I selected the top-300 genes if the list was based on a single time course and the top-500 genes if the list was based on three or more time courses. It should be noted that both of these cutoffs are on the conservative side since most studies propose 800 or more periodically expressed genes when combining multiple expression time courses.

This meta-analysis resulted in a list of more than 4200 budding yeast genes that are periodically expressed according to at least one of the methods listed above; that is more than two-thirds of all genes encoded by the budding yeast genome!

To investigate further how such a large number of genes can have been proposed to be periodically expressed, I plotted how many of these genes are on how many of the lists of periodically expressed genes:

The histogram reveals that the majority of the over 4200 genes have been proposed by only one or two analyses. It seems reasonable to assume that the genes that have been proposed as periodically expressed by only one or a few methods are less likely to be correct than the ones that many methods agree on. Also, one could expect that taking the consensus of many methods would yield a more reliable answer than using just a single method.

To test these two hypotheses, I compared two different ways of identifying the periodically expressed genes:

  1. Ranking the genes based on a single scoring scheme that combines all the available experimental data (Gauthier et al., Nucleic Acids Res., 2008)
  2. Ranking the genes based on vote among 30 different methods (not 31; the analysis by Orlando and coworkers was left out of the voting as this dataset is not included in Cyclebase.org)

To benchmark the two methods, I compared the ranked lists to a set of target genes for cell-cycle transcrition factors identified in genome-wide ChIP-on-chip experiments and plotted the fraction of these that were identified as function of the number of genes proposed to be periodically expressed:

The plot confirms that genes proposed to be periodically by multiple methods are more likely to be targets of cell-cycle transcription factors, and are hence more likely to truly be subject to transcriptional cell-cycle regulation. However, it also shows that the list obtained by voting among 30 methods is a bit worse than what is obtained by using the single best method.

This result may come as a surprise to many since meta-servers that combine multiple prediction methods have in the past proven very successful for many other bioinformatics tasks. I suspect that the approach fails in this case for two reasons: first, many of the analyses included perform considerably worse than the best one, and second, most of the methods make use of only half of the available experimental data. It may thus be possible to obtain better results by selecting only a subset of the methods and rerunning each of them on all the available data. So far, however, dictatorship seems to work better than democracy for identification of periodically expressed genes.

WebCiteCite this post

Analysis: Cell-cycle expression of cancer genes

I have long used a data integration approach to obtain a global picture of eukaryotic cell-cycle regulation. The cell cycle is a popular research topic in part because of its importance for cancer research. I thus recently compared microarray expression data on the human cell cycle to genes with mutations that have been causally implicated in various forms of cancer.

From the Cancer Genome Project website, I downloaded a list of 353 human genes that are implicated in cancer. Using the identifier mapping file from STRING, I was able to automatically map 338 of these genes to the set of human genes from Ensembl that I used in earlier cell-cycle studies. 295 of the 338 genes were present on the microarrays used in the cell-cycle expression study by Whitfield et al. (2002). However, only 23 of these are among the 600 periodically expressed genes identified in the reanalysis by Jensen et al. (2006). The many numbers are illustrated in the diagram below:

By random chance, 295*600/12097 = 15 of the 295 genes would be expected to be periodically expressed, and the enrichment is thus only a bit over 1.5 fold. Although this enrichment is statistically significantly (P < 3%, Fisher’s exact test), the correlation is clearly not strong enough to allow prediction of novel cancer genes.

My step was to look at the evolutionary conservation of the 23 periodically expressed cancer genes. Only 12 of them belong to an orthologous group. Half of them do thus not appear to have orthologs in budding yeast, fission yeast, or Arabidopsis thaliana. Only three periodically expressed cancer genes have orthologs in all of these organisms. One of these genes is periodically expressed onlt in human, one in human and fission yeast, and one in all four organisms (a histone subunit).

In summary, it seems that one cannot say much about cancer based on cell-cycle mRNA expression data. This is perhaps not surprising considering that the transcriptional regulation does not seem to vary much between cancer cells and normal cells.

WebCiteCite this post