Over the years several microarray time-course experiments have been performed to identify the genes that are transcriptionally regulated during the mitotic cell cycle, i.e the periodically expressed genes. Moreover, bioinformaticians have developed many different computational methods for identifying the periodically expressed genes from microarray time-course data.
Below is a list of the experimental and computational analyses of the budding yeast cell cycle that I am aware of (please notify me if you know of other microarray experiments or computational methods):
- Cho et al., Mol. Cell, 1998
- Spellman et al., Mol. Biol. Cell, 1998
- Zhao et al., Proc. Natl. Acad. Sci. USA, 2001
- Langmead et al., Proc. IEEE Comput. Soc. Bioinformatics Conf., 2002
- Langmead et al.,RECOMB, 2002
- Langmead et al., J. Comput. Biol., 2003
- de Lichtenberg et al., J. Mol. Biol., 2003
- Johansson et al., Bioinformatics, 2003
- Wichert et al., Bioinformatics, 2004
- Lu et al., Nucleic Acids Res., 2004
- Luan and Li, Bioinformatics, 2004
- de Lichtenberg et al., Bioinformatics, 2005
- de Lichtenberg et al., Yeast, 2005
- Willbrand et al., Bioinformatics, 2005
- Ahdesmäki et al., BMC Bioinformatics, 2005
- Chen, BMC Bioinformatics, 2005
- Qiu et al., Conf. Proc. IEEE Eng. Med. Biol. Soc., 2005
- Qiu et al., Bioinformatics, 2006
- Andersson et al., BMC Bioinformatics, 2006
- Gan et al., Int. Conf. Pattern Recog., 2006
- Glynn et al., Bioinformatics, 2006
- Ahnert et al., Bioinformatics, 2006
- Lu et al., Bioinformatics, 2006
- Xu et al., LSS Comput. Syst. Bioinformatics Conf., 2006
- Pramilla et al., Genes Dev., 2006
- Liew et al, BMC Bioinformatics, 2007
- Lu et al., Genome Biol., 2007
- Morton et al., Stat. Appl. Genet. Mol. Biol., 2007
- Rowicka et al., Proc. Natl. Acad. Sci. USA, 2007
- Gauthier et al., Nucleic Acids Res., 2008
- Orlando et al., Nature, 2008
These studies have reported a mixture of ranked and unranked lists of periodically expressed genes. By that I mean that some studies provided a list of genes sorted according to how periodic the expression profiles appear, whereas others simply provide a list of the genes deemed periodic. For the ranked lists, I first checked the publications to see if the authors suggested a cutoff for the number of periodically expressed genes, in which case I followed their recommendations. If the authors suggested multiple lists of varying confidence, I used the highest-confidence list. If no cutoff was proposed, I selected the top-300 genes if the list was based on a single time course and the top-500 genes if the list was based on three or more time courses. It should be noted that both of these cutoffs are on the conservative side since most studies propose 800 or more periodically expressed genes when combining multiple expression time courses.
This meta-analysis resulted in a list of more than 4200 budding yeast genes that are periodically expressed according to at least one of the methods listed above; that is more than two-thirds of all genes encoded by the budding yeast genome!
To investigate further how such a large number of genes can have been proposed to be periodically expressed, I plotted how many of these genes are on how many of the lists of periodically expressed genes:
The histogram reveals that the majority of the over 4200 genes have been proposed by only one or two analyses. It seems reasonable to assume that the genes that have been proposed as periodically expressed by only one or a few methods are less likely to be correct than the ones that many methods agree on. Also, one could expect that taking the consensus of many methods would yield a more reliable answer than using just a single method.
To test these two hypotheses, I compared two different ways of identifying the periodically expressed genes:
- Ranking the genes based on a single scoring scheme that combines all the available experimental data (Gauthier et al., Nucleic Acids Res., 2008)
- Ranking the genes based on vote among 30 different methods (not 31; the analysis by Orlando and coworkers was left out of the voting as this dataset is not included in Cyclebase.org)
To benchmark the two methods, I compared the ranked lists to a set of target genes for cell-cycle transcrition factors identified in genome-wide ChIP-on-chip experiments and plotted the fraction of these that were identified as function of the number of genes proposed to be periodically expressed:
The plot confirms that genes proposed to be periodically by multiple methods are more likely to be targets of cell-cycle transcription factors, and are hence more likely to truly be subject to transcriptional cell-cycle regulation. However, it also shows that the list obtained by voting among 30 methods is a bit worse than what is obtained by using the single best method.
This result may come as a surprise to many since meta-servers that combine multiple prediction methods have in the past proven very successful for many other bioinformatics tasks. I suspect that the approach fails in this case for two reasons: first, many of the analyses included perform considerably worse than the best one, and second, most of the methods make use of only half of the available experimental data. It may thus be possible to obtain better results by selecting only a subset of the methods and rerunning each of them on all the available data. So far, however, dictatorship seems to work better than democracy for identification of periodically expressed genes.