Archive for the 'Analysis' Category

Analysis: Degradation signals correlate with protein half-life

June 16, 2008

I yesterday blogged about how the protein half-life data from the O’Shea lab fit well with my earlier analyses of transcriptional regulation during the budding yeast cell cycle and with the just-in-time assembly hypothesis. However, I have now realized that the same data set can be used to test the validity of the sequence-based predictions of protein degradation signals that I relied on for the cell-cycle study.

To this end, I divided the budding yeast proteome into six groups: proteins with a D-box, proteins without a D-box, proteins with a KEN-box, proteins without a KEN-box, proteins with a PEST region, and proteins without a PEST region. For each of these six groups of proteins, I simply plotted the distribution of protein half-lives as a histogram:

The figure shows that for all three degradation signals, proteins with the sequence motif tend to have shorter half-lives than proteins without the motif. These differences are all statistically significant according to the Mann-Whitney U test (D-box, P < 10-6; KEN-box, P < 0.02; PEST region, P < 10-15). It is noteworthy that the KEN-box motif gives a far weaker correlation with protein half-live than the two other degradation signals, as it was also the only degradation signal that did not correlate with transcriptional cell-cycle regulation in budding yeast (see supplementary information of Jensen et al., 2006).

In summary, proteins that contain putative degradation signals have significantly shorter half-lives than proteins that do not contain such signals. The only caveat is that long sequences are more likely to match the sequence motifs, and that O’Shea and colleagues found a negative correlation between sequence length and protein half-life. The correlations described here could thus be a secondary effect; however, it is also possible that the presence of degradation signals in long sequences is the missing explanation for their short half-lives.

Analysis: Cell-cycle-regulated genes encode short-lived proteins

June 15, 2008

In relation to an entirely different analysis than the one I will describe here, I downloaded the protein half-life data for budding yeast that was published in PNAS by the O’Shea lab about two years ago:

Quantification of protein half-lives in the budding yeast proteome

A complete description of protein metabolism requires knowledge of the rates of protein production and destruction within cells. Using an epitope-tagged strain collection, we measured the half-life of >3,750 proteins in the yeast proteome after inhibition of translation. By integrating our data with previous measurements of protein and mRNA abundance and translation rate, we provide evidence that many proteins partition into one of two regimes for protein metabolism: one optimized for efficient production or a second optimized for regulatory efficiency. Incorporation of protein half-life information into a simple quantitative model for protein production improves our ability to predict steady-state protein abundance values. Analysis of a simple dynamic protein production model reveals a remarkable correlation between transcriptional regulation and protein half-life within some groups of coregulated genes, suggesting that cells coordinate these two processes to achieve uniform effects on protein abundances. Our experimental data and theoretical analysis underscore the importance of an integrative approach to the complex interplay between protein degradation, transcriptional regulation, and other determinants of protein metabolism.

The idea that transcriptional regulation goes hand-in-hand with protein degradation is fully consistent with the just-in-time assembly hypothesis. I thus examined the distributions of protein half-lives for dynamic (i.e. periodically expressed) and static (i.e. not periodically expressed) proteins:

The histogram suggests that dynamic proteins are shifted towards shorter half-lives relative to static proteins. The difference is indeed statistically significant according to the Mann-Whitney U test (P < 10-4). This result supports the sequence-based observation that dynamic proteins contain more D-box, KEN-box, and PEST degradation signals than static proteins.

I next tested if the half-life of the dynamic proteins varies during the cell cycle by make scatter plot of the protein half-life as function of the time of peak expression for the corresponding mRNA:

There appears to be no correlation. Together, these analyses indicate that dynamic proteins have shorter half-lives than static proteins, irrespective of when in the cell cycle they are expressed.

Analysis: A democratic approach to identification of cell-cycle-regulated genes

May 22, 2008

Over the years several microarray time-course experiments have been performed to identify the genes that are transcriptionally regulated during the mitotic cell cycle, i.e the periodically expressed genes. Moreover, bioinformaticians have developed many different computational methods for identifying the periodically expressed genes from microarray time-course data.

Below is a list of the experimental and computational analyses of the budding yeast cell cycle that I am aware of (please notify me if you know of other microarray experiments or computational methods):

  1. Cho et al., Mol. Cell, 1998
  2. Spellman et al., Mol. Biol. Cell, 1998
  3. Zhao et al., Proc. Natl. Acad. Sci. USA, 2001
  4. Langmead et al., Proc. IEEE Comput. Soc. Bioinformatics Conf., 2002
  5. Langmead et al.,RECOMB, 2002
  6. Langmead et al., J. Comput. Biol., 2003
  7. de Lichtenberg et al., J. Mol. Biol., 2003
  8. Johansson et al., Bioinformatics, 2003
  9. Wichert et al., Bioinformatics, 2004
  10. Lu et al., Nucleic Acids Res., 2004
  11. Luan and Li, Bioinformatics, 2004
  12. de Lichtenberg et al., Bioinformatics, 2005
  13. de Lichtenberg et al., Yeast, 2005
  14. Willbrand et al., Bioinformatics, 2005
  15. Ahdesmäki et al., BMC Bioinformatics, 2005
  16. Chen, BMC Bioinformatics, 2005
  17. Qiu et al., Conf. Proc. IEEE Eng. Med. Biol. Soc., 2005
  18. Qiu et al., Bioinformatics, 2006
  19. Andersson et al., BMC Bioinformatics, 2006
  20. Gan et al., Int. Conf. Pattern Recog., 2006
  21. Glynn et al., Bioinformatics, 2006
  22. Ahnert et al., Bioinformatics, 2006
  23. Lu et al., Bioinformatics, 2006
  24. Xu et al., LSS Comput. Syst. Bioinformatics Conf., 2006
  25. Pramilla et al., Genes Dev., 2006
  26. Liew et al, BMC Bioinformatics, 2007
  27. Lu et al., Genome Biol., 2007
  28. Morton et al., Stat. Appl. Genet. Mol. Biol., 2007
  29. Rowicka et al., Proc. Natl. Acad. Sci. USA, 2007
  30. Gauthier et al., Nucleic Acids Res., 2008
  31. Orlando et al., Nature, 2008

These studies have reported a mixture of ranked and unranked lists of periodically expressed genes. By that I mean that some studies provided a list of genes sorted according to how periodic the expression profiles appear, whereas others simply provide a list of the genes deemed periodic. For the ranked lists, I first checked the publications to see if the authors suggested a cutoff for the number of periodically expressed genes, in which case I followed their recommendations. If the authors suggested multiple lists of varying confidence, I used the highest-confidence list. If no cutoff was proposed, I selected the top-300 genes if the list was based on a single time course and the top-500 genes if the list was based on three or more time courses. It should be noted that both of these cutoffs are on the conservative side since most studies propose 800 or more periodically expressed genes when combining multiple expression time courses.

This meta-analysis resulted in a list of more than 4200 budding yeast genes that are periodically expressed according to at least one of the methods listed above; that is more than two-thirds of all genes encoded by the budding yeast genome!

To investigate further how such a large number of genes can have been proposed to be periodically expressed, I plotted how many of these genes are on how many of the lists of periodically expressed genes:

The histogram reveals that the majority of the over 4200 genes have been proposed by only one or two analyses. It seems reasonable to assume that the genes that have been proposed as periodically expressed by only one or a few methods are less likely to be correct than the ones that many methods agree on. Also, one could expect that taking the consensus of many methods would yield a more reliable answer than using just a single method.

To test these two hypotheses, I compared two different ways of identifying the periodically expressed genes:

  1. Ranking the genes based on a single scoring scheme that combines all the available experimental data (Gauthier et al., Nucleic Acids Res., 2008)
  2. Ranking the genes based on vote among 30 different methods (not 31; the analysis by Orlando and coworkers was left out of the voting as this dataset is not included in Cyclebase.org)

To benchmark the two methods, I compared the ranked lists to a set of target genes for cell-cycle transcrition factors identified in genome-wide ChIP-on-chip experiments and plotted the fraction of these that were identified as function of the number of genes proposed to be periodically expressed:

The plot confirms that genes proposed to be periodically by multiple methods are more likely to be targets of cell-cycle transcription factors, and are hence more likely to truly be subject to transcriptional cell-cycle regulation. However, it also shows that the list obtained by voting among 30 methods is a bit worse than what is obtained by using the single best method.

This result may come as a surprise to many since meta-servers that combine multiple prediction methods have in the past proven very successful for many other bioinformatics tasks. I suspect that the approach fails in this case for two reasons: first, many of the analyses included perform considerably worse than the best one, and second, most of the methods make use of only half of the available experimental data. It may thus be possible to obtain better results by selecting only a subset of the methods and rerunning each of them on all the available data. So far, however, dictatorship seems to work better than democracy for identification of periodically expressed genes.

Analysis: Cell-cycle expression of cancer genes

April 15, 2008

I have long used a data integration approach to obtain a global picture of eukaryotic cell-cycle regulation. The cell cycle is a popular research topic in part because of its importance for cancer research. I thus recently compared microarray expression data on the human cell cycle to genes with mutations that have been causally implicated in various forms of cancer.

From the Cancer Genome Project website, I downloaded a list of 353 human genes that are implicated in cancer. Using the identifier mapping file from STRING, I was able to automatically map 338 of these genes to the set of human genes from Ensembl that I used in earlier cell-cycle studies. 295 of the 338 genes were present on the microarrays used in the cell-cycle expression study by Whitfield et al. (2002). However, only 23 of these are among the 600 periodically expressed genes identified in the reanalysis by Jensen et al. (2006). The many numbers are illustrated in the diagram below:

By random chance, 295*600/12097 = 15 of the 295 genes would be expected to be periodically expressed, and the enrichment is thus only a bit over 1.5 fold. Although this enrichment is statistically significantly (P < 3%, Fisher’s exact test), the correlation is clearly not strong enough to allow prediction of novel cancer genes.

My step was to look at the evolutionary conservation of the 23 periodically expressed cancer genes. Only 12 of them belong to an orthologous group. Half of them do thus not appear to have orthologs in budding yeast, fission yeast, or Arabidopsis thaliana. Only three periodically expressed cancer genes have orthologs in all of these organisms. One of these genes is periodically expressed onlt in human, one in human and fission yeast, and one in all four organisms (a histone subunit).

In summary, it seems that one cannot say much about cancer based on cell-cycle mRNA expression data. This is perhaps not surprising considering that the transcriptional regulation does not seem to vary much between cancer cells and normal cells.

Analysis: Cancer or not, cell-cycle expression stays the same

April 13, 2008

The groups of Ziv Bar-Joseph and Itamar Simon recently published a paper in PNAS on a new microarray study of the cell cycle of primary human fibroblasts:

Genome-wide transcriptional analysis of the human cell cycle identifies genes differentially regulated in normal and cancer cells

Characterization of the transcriptional regulatory network of the normal cell cycle is essential for understanding the perturbations that lead to cancer. However, the complete set of cycling genes in primary cells has not yet been identified. Here, we report the results of genome-wide expression profiling experiments on synchronized primary human foreskin fibroblasts across the cell cycle. Using a combined experimental and computational approach to deconvolve measured expression values into ‘‘single-cell’’ expression profiles, we were able to overcome the limitations inherent in synchronizing nontransformed mammalian cells. This allowed us to identify 480 periodically expressed genes in primary human foreskin fibroblasts. Analysis of the reconstructed primary cell profiles and comparison with published expression datasets from synchronized transformed cells reveals a large number of genes that cycle exclusively in primary cells. This conclusion was supported by both bioinformatic analysis and experiments performed on other cell types. We suggest that this approach will help pinpoint genetic elements contributing to normal cell growth and cellular transformation.

In contrast to the earlier study by Whitfield et al. (2002), which was performed on HeLa cells, Ziv Bar-Joseph et al. worked on non-transformed fibroblasts. The dataset thus offers a first global view of the differences between the cell cycle of normal human cells and that of cancer cells.

To compare their list of cell-cycle-regulated human genes to the one the I have used so far, I mapped their 480 genes to Ensembl using the mapping file from the STRING database. This resulted in a list of 410 genes, that is 70 genes could not be mapped by the automatic procedure. Whereas this is far from a perfect mapping, it is sufficient to judge the quality of the list.

The plots below show the fraction of a benchmark set that is identified as function of the number of genes that is proposed to be periodically expressed during the cell cycle. In each plot, I compare the results for the list of 410 obtained from the new study by Bar-Joseph et al., the analysis by Whitfield et al., and the reanalysis of the latter dataset by Jensen et al. (2006) (available from Cyclebase.org). To make the comparison as fair as possible, I only considered the subset of genes that were present in both microarray designs. The first plot uses as benchmark a set of 63 genes that have been identified as periodically expressed in targeted small-scale studies:

Three sets of cell-cycle-regulated human genes compared to benchmark set B1

I also benchmarked the three gene lists against a second benchmark set, which consists of predicted target genes of E2F cell-cycle transcription factors:

Three sets of cell-cycle-regulated human genes compared to benchmark set B2

Both benchmarks suggest that the three lists are of very comparable quality, but that the list by Whitfield and coworkers is much more inclusive than the one from Bar-Joseph and coworkers. In other words, the former list has better sensitivity whereas the latter has better specificity. This is consistent with the results presented by Bar-Joseph et al., who conclude that their list is more reliable than the previously published list. However, this is probably not due to better quality of the raw expression data, since reanalysis of the data by Whitfield et al. yielded a list with almost identical sensitivity and specificity (that is the red curve is very close to the blue cross in both plots).

Although the two lists of periodically expressed are of comparable quality, they may still contain very different sets of genes. I therefore decided to compare the list of genes that are periodically expressed in the time course on primary fibroblasts and in each of the four time courses on HeLa cells. To make this comparison as easy as possible, I selected the top-364 cycling genes from each of the four HeLa time courses based on the reanalysis by Jensen et al. (2006). The ten Venn diagrams below show all pairwise comparisons of the five lists of 364 genes each:

The average overlap between the list by Bar-Joseph et al. and an experiment from Whitfield et al. is 114 genes. By comparison, the average overlap between the top-364 lists from two individual experiments from Whitfield et al. is 123 genes. Although the overlap may seem low, I thus believe that it is due to the poor reproducibility between microarray time courses rather than due to genuine differences between primary fibroblasts and HeLa cells as suggested by Bar-Joseph and colleagues.

Although cancer cells have to circumvent the regulatory mechanisms that would normally prevent cell proliferation, the cell cycle itself appears to function the same way as in normal cells. In other words, the difference does not lie in the “engine” but in the “brakes”, which have been sabotaged in cancer cells.

Analysis: The budding yeast phosphoproteome

March 23, 2008

The group of Donald F. Hunt at University of Virginia has recently published a paper in PNAS that describes a new phosphoproteomics study of budding yeast:

Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry

We present a strategy for the analysis of the yeast phosphoproteome that uses endo-Lys C as the proteolytic enzyme, immobilized metal affinity chromatography for phosphopeptide enrichment, a 90-min nanoflow-HPLC/electrospray-ionization MS/MS experiment for phosphopeptide fractionation and detection, gas phase ion/ion chemistry, electron transfer dissociation for peptide fragmentation, and the Open Mass Spectrometry Search Algorithm for phosphoprotein identification and assignment of phosphorylation sites. From a 30-microg (approximately 600 pmol) sample of total yeast protein, we identify 1,252 phosphorylation sites on 629 proteins. Identified phosphoproteins have expression levels that range from <50 to 1,200,000 copies per cell and are encoded by genes involved in a wide variety of cellular processes. We identify a consensus site that likely represents a motif for one or more uncharacterized kinases and show that yeast kinases, themselves, contain a disproportionately large number of phosphorylation sites. Detection of a pHis containing peptide from the yeast protein, Cdc10, suggests an unexpected role for histidine phosphorylation in septin biology. From diverse functional genomics data, we show that phosphoproteins have a higher number of interactions than an average protein and interact with each other more than with a random protein. They are also likely to be conserved across large evolutionary distances.

As is so often the case with experimental papers, no comparison is provided to earlier studies. I thus decided to compare the set of phosphoproteins identified by Hunt and coworkers to the set of Cdc28p substrates identified in two studies by the Morgan lab as well as to the proteome-wide, sequence-based predictions made by NetPhosK:

Venn diagram comparing three sets of phosphoproteins from budding yeast

The Venn diagram obviously shows that each of the three sets contains a considerable number of phosphoproteins that are not present in any of the other sets. This was to be expected since the three methods are fundamentally very different. The dataset from the Hunt lab includes proteins that are phosphorylated by other kinases than Cdc28p; however, it is limited in the sense that low-abundance phosphopeptides are typically missed by MS studies. Conversely, the set from the Morgan lab consists only of Cdc28p substrates, but is likely to have much better coverage of low-abundance phosphoproteins. Finally, the set of Cdc28 substrates from NetPhosK is likely to contain a considerable number of false positives as they are predicted from the protein sequence alone.

As a matter of fact, I find the overlap between the three sets to be surprisingly good. Even if we assume that the dataset from the Morgan lab contains no false positives, the overlap suggests that the new dataset from Hunt and coworkers captures one third of all phosphoproteins in budding yeast; assuming errors in both datasets increases this estimate. It is also noteworthy that NetPhosK misses only 22% of the Cdc28p that were identified by the Morgan lab and supported by the new data from the Hunt lab, although this high coverage is probably obtained at the price of many false positive predictions.

Analysis: The transcriptional response to growth rate is unrelated to cell-cycle regulation

March 10, 2008

David Botstein’s group at Princeton recently published a paper in Molecular Biology of the Cell with the title “Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolic Activity in Yeast”. As described in their abstract, they found interesting several correlations between the transcriptional responses to changes in growth rate and the regulation in response to stress and during the metabolic cycle:

We studied the relationship between growth rate and genome-wide gene expression, cell cycle progression, and glucose metabolism in 36 steady-state continuous cultures limited by one of six different nutrients (glucose, ammonium, sulfate, phosphate, uracil, or leucine). The expression of more than one quarter of all yeast genes is linearly correlated with growth rate, independent of the limiting nutrient. The subset of negatively growth-correlated genes is most enriched for peroxisomal functions, whereas positively correlated genes mainly encode ribosomal functions. Many (not all) genes associated with stress response are strongly correlated with growth rate, as are genes that are periodically expressed under conditions of metabolic cycling. We confirmed a linear relationship between growth rate and the fraction of the cell population in the G0/G1 cell cycle phase, independent of limiting nutrient. Cultures limited by auxotrophic requirements wasted excess glucose, whereas those limited on phosphate, sulfate, or ammonia did not; this phenomenon (reminiscent of the “Warburg effect” in cancer cells) was confirmed in batch cultures. Using an aggregate of gene expression values, we predict (in both continuous and batch cultures) an “instantaneous growth rate”. This concept is useful in interpreting the system-level connections among growth rate, metabolism, stress, and the cell cycle.

Because of my interest in cell cycle, their results regarding growth rate and cell-cycle regulation caught my attention. In Figure 6 of their paper, Brauer et al. show the slope distribution for the genes belonging to each of the phase-specific clusters defined by Spellman et al. (1998). The only trend they observe is that genes expressed at the G1/M transition.

I decided to redo the cell-cycle part of their analysis in a slightly different manner, hoping that I would be able to get a stronger signal than they did. Rather than using the 800 periodically expressed genes proposed by Spellman et al. (1998), I thus made use of the list of 600 periodically expressed genes from de Lichtenberg et al. (2005). Like Brauer et al., I found no difference in growth-rate response between cell-cycle-regulated genes and other genes. To analyze the phase-specific expression, I chose to plot the peak time distributions for genes that are up- and down-regulated in response to increasing growth rate:

Peak-time distribution for genes that are up- or down-regulated in response to increasing growth rate

In agreement with Brauer et al., genes that are down-regulated at high growth rates appear to have a striking preference for being expressed at the G1/M transition. However, manual inspection of these genes revealed that more than half of them belong to the Y’ family of DNA helicases, which are encoded by the sub-telomeric regions (striped blue bars). The trend observed by Brauer et al. is thus presumably not due to slower growing cells spending more time in M-G1 phase as suggested by the authors, Instead, it is likely an artifact of the many Y’ helicase genes found in the sub-telomeric regions of budding yeast, which are so highly homologous that they can cross hybridize on microarrays and hence all appear to be periodically expressed with identical peak times.

After correcting for this the down-regulated genes show a weak preference for being expressed during M phase whereas the up-regulated genes tend to be expressed in late G1 and S phase. However, the peak-time distributions of up- and down-regulated do not differ significantly from that of all cell-cycle-regulated genes (Kolmogorov-Smirnov test).

In summary, my reanalysis suggests that there is no correlation between the transcriptional response to changes in growth rate and transcriptional cell-cycle regulation. It also reiterates the importance of manually inspecting the results from statistical analyses - they may be highly significant for all the wrong reasons.

Analysis: Cell-cycle phenotypes and regulation, part 2

February 28, 2008

I have previously blogged about the relationship between cell-cycle phenotypes and regulation in human as well as budding yeast. I was thus excited to see the new RNAi study on cell-cycle phenotypes by Rines and coworkers that was published in Genome Biology two days ago. The title of their paper is “Whole genome functional analysis identifies novel components required for mitotic spindle integrity in human cells”, and the abstract reads as follows:

Background

The mitotic spindle is a complex mechanical apparatus required for accurate segregation of sister chromosomes during mitosis. We designed a genetic screen using automated microscopy to discover factors essential for mitotic progression. Using a RNAi library of 49,164 double-stranded (ds)RNAs targeting 23,835 human genes, we performed a loss-of-function screen looking for siRNAs that arrest cells in metaphase.

Results

Here we report the identification of genes that when suppressed result in structural defects in the mitotic spindle leading to bent, twisted, monopolar or multipolar spindles and cause a cell cycle arrest. We further described a novel analysis methodology for large-scale RNAi datasets which relies upon supervised clustering of these genes based on gene ontology (GO), protein families, tissue expression and protein-protein interactions.

Conclusions

This approach was utilized to functionally classify the identified genes in discrete mitotic processes. We confirmed the identity for a subset of these genes and examined more closely their mechanical role in spindle architecture.

The screen identified a set of 226 genes that when suppressed lead to spindle-related cell-cycle phenotypes. Using the name-mapping files from STRING, I was able to map 175 of them to the set of genes used in my other cell-cycle analyses. The results presented below are all based on this set of 175 genes.

To my surprise, Rines and coworkers did not compare their results to the earlier phenotypic screen published by Mukherji et al. in PNAS. Since I had already mapped this dataset onto the same gene set, it was easy to make a comparison of the new phenotype data from Rines et al. and the eight phenotypic categories defined by Mukherji et al.:

Category Description Overlap Significance
1 G1 small nuclear area 2/116 n.s.
2 G1 2/117 n.s.
3 S 1/61 n.s.
4 S + G2/M 4/59 P < 0.002; FDR < 1%
5 G2/M large nucleus 5/200 P < 0.019; FDR < 5%
6 G2/M 4/259 n.s.
7 G2/M + endoduplication 1/52 n.s.
8 Cytokinesis 3/36 P < 0.003; FDR < 1%

The statistical significance of the overlap was assessed using Fisher’s exact test and the false discovery rate (FDR) was calculated using the Benjamini-Hochberg method. As can be seen, the agreement between the two studies is very poor. Nonetheless, it is reassuring that the largest overlap (>8%) is observed for category 8, since spindle defects should be expected to result in problems during cytokinesis.

I also looked into the transcriptional and post-translational regulation of the 175 genes. The cell-cycle microarray study by Whitfield and coworkers covered 124 of the genes, 15 of which are periodically expressed (P < 0.002; Fisher’s exact test). Plotting the distribution of peak times for these genes confirms the observation by Rines et al. that the genes tend to be expressed around the G2/M transition and during M phase:

Peak time distributions for human genes identified by Rines et al. and Mukheriji et al.

As should be expected, the peak-time distribution for the genes identified by Rines et al. is in agreement with the corresponding distributions for categories 4, 5, and 8 from Mukherji and coworkers.

Comparison with a set of 985 phosphoproteins identified in low-throughput studies (obtained from Phospho.ELM) shows that the proteins products encoded by the 175 genes are preferentially phosphorylated (P < 0.001; Fisher’s exact test). This result is confirmed by comparisons with large mass-spectrometry studies (P < 0.03; Fisher’s exact test) and CDK substrates predicted by NetPhosK (P < 0.05; Fisher’s exact test).

Finally, I analyzed the protein products encoded by the 175 genes for degradation signals. 22 of them contain a strong D-box motif (P < 0.03; Fisher’s exact test) and 28 contain a KEN-box motif (P < 0.002; Fisher’s exact test). By contrast, the gene products identified by Rines et al. display no overrepresentation of PEST degradation signals. This makes sense since proteins with D-box and/or KEN-box motifs are polyubiquitinated by the anaphase-promoting complex (APC) during late M phase, which targets them for degradation by the proteasome.

In summary, Rines and coworkers has identified a set of genes that show weak but significant overlap with some of the phenotypic categories defined by Mukherji et al., with periodically expressed genes identified based on microarray data from Whitfield et al., with known and predicted phosphoproteins, and with predicted degradation signals. All of the results are consistent with the majority of the 175 genes functioning during G2/M and early M phase.

Analysis: Evolution of transcription-factor binding and cell-cycle-regulated transcription

February 22, 2008

Together with collaborators in Søren Brunak’s group, I have earlier published a comparative study on eukaryotic cell-cycle regulation. In the supplement and earlier papers, we presented benchmarks that documented the sensitivity with which periodically expressed genes can be identified based on microarray expression data. We thereby showed that the poor evolutionary conservation of transcriptional cell-cycle regulation is not an artifact of individual gene lists being unreliable.

However, there is a more direct test that we did not think of at the time, namely to check if the changes in periodic transcription agree with the binding of cell-cycle transcription factors in each organism. The first step is to select two organisms (organism 1 and organism 2) and extract two sets of genes: 1) cycling genes from organism 1 with non-cycling orthologs in organism 2 and 2) non-cycling genes from organism 1 with cycling orthologs in organism 2. Next, Fisher’s exact test is used to determine if targets of cell-cycle transcription factors are overrepresented in the first set relative to the second. This procedure is equivalent to the test for coevolution between transcriptional and postranslational regulation (see Jensen et al. (2006) for details).

I used the procedure to perform all pairwise tests for Homo sapiens, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Arabidopsis thaliana. For each choice of organism 1, I used the same set of cell-cycle transcription-factor targets also used for the original benchmarks. The table below sumarizes the results of the statistical tests; the rows specify organism 1 and columns specify organism 2:

  H. sapiens S. cerevisiae S. pombe A. thaliana
H. sapiens   P < 10-5 P < 10-9 P < 10-6
S. cerevisiae P < 10-8   P < 10-7 P < 0.01
S. pombe P < 10-4 n.s.   P < 0.01
A. thaliana P < 0.09 n.s. P < 10-4  

For most of the pairwise organism comparisons, the expected coevolution of transcription factor binding and cell-cycle-regulated transcription is supported by the statistical test. Benjamini-Hochberg correction for multiple testing was thus not performed as it would change the p-values only marginally (by a factor of 4/3 to be exact). Apart from the S. pombe vs. S. cerevisiae comparison, the weak correlations all involve A. thaliana for which only very limited microarray expression data is available.

This analysis shows that the differences in cell-cycle-regulated transcription (as measured by microarrays) are consistent with the available data on transcription-factor binding. This provides direct evidence that the poor conservation of cell-cycle regulation observed between eukaryotes is due to genuine, biological differences.

Analysis: Cell-cycle phenotypes and regulation

February 14, 2008

In 2006 the Schultz lab at the Scripps Research Institute published a paper in PNAS called “Genome-wide functional analysis of human cell-cycle regulators”. The abstract reads:

Human cells have evolved complex signaling networks to coordinate the cell cycle. A detailed understanding of the global regulation of this fundamental process requires comprehensive identification of the genes and pathways involved in the various stages of cell-cycle progression. To this end, we report a genome-wide analysis of the human cell cycle, cell size, and proliferation by targeting >95% of the protein-coding genes in the human genome using small interfering RNAs (siRNAs). Analysis of >2 million images, acquired by quantitative fluorescence microscopy, showed that depletion of 1,152 genes strongly affected cell-cycle progression. These genes clustered into eight distinct phenotypic categories based on phase of arrest, nuclear area, and nuclear morphology. Phase-specific networks were built by interrogating knowledge-based and physical interaction databases with identified genes. Genome-wide analysis of cell-cycle regulators revealed a number of kinase, phosphatase, and proteolytic proteins and also suggests that processes thought to regulate G1-S phase progression like receptor-mediated signaling, nutrient status, and translation also play important roles in the regulation of G2/M phase transition. Moreover, 15 genes that are integral to TNF/NF-κB signaling were found to regulate G2/M, a previously unanticipated role for this pathway. These analyses provide systems-level insight into both known and novel genes as well as pathways that regulate cell-cycle progression, a number of which may provide new therapeutic approaches for the treatment of cancer.

I recently wrote a commentary about how phenotypes in yeast agree remarkably well with the just-in-time assembly hypothesis for cell-cycle regulation of protein complexes. I thus decided to also compare the dataset on cell-cycle phenotypes for human genes with the cell-cycle microarray expression data published in 2002 by Whitfield and coworkers.

Using the mapping files from the STRING database, I was able to automatically map 741 of the 1152 genes with cell-cycle phenotypes to the set of 12,097 genes for which we have cell-cycle microarray expression data. Of the 741 genes, 55 are among the a of 600 periodically expressed genes identified in a reanalysis of the data from Whitfield and coworkers. This is just shy of 50% more than what would be expected by random chance (P < 0.001; Fisher’s exact test).

The authors divided the cell-cycle mutants into eight classes. Repeating the above analysis for each of these categories separately revealed that genes with phenotypes related to S-phase and cytokinesis were significantly overrepresented among the 600 periodically expressed genes (FDR < 0.05; Fisher’s exact test and Benjamini-Hochberg correction for multiple testing). The other categories did not yield statistically significant results.

To look at the temporal regulation of transcription in more detail, I plotted the distribution of peak times (the point in the cell cycle when a gene is maximally expressed) for the periodically expressed genes from each of the eight phenotypic categories:

Peak time distributions for human genes with cell-cycle-related phenotypes

For the periodically expressed genes that display a cell-cycle phenotype in the screen by Schultz and coworkers, the observed phenotypes agree with the time of peak expression. In particular, the genes with cytokinesis-related phenotypes are all expressed shortly before the time of cell division (cytokinesis). Most of the periodically expressed genes with phenotypes related to S phase are similarly expressed during S phase (roughly 50-70% into the cell cycle), genes with phenotypes related to the G2/M transition also tend to be expressed during the appropriate phase of the cell cycle.

In summary, these results support the view that cell-cycle-regulated genes are expressed shortly before their time of action, despite the fact that regulation also takes place at the protein level. It also confirms that many genes with cell-cycle function are not subject to transcriptional cell-cycle regulation.