A while back, I found a paper in PLoS Genetics from Jason Lieb’s lab entitled “Cell Cycle-Specified Fluctuation of Nucleosome Occupancy at Gene Promoters”. There is little point in me rephrasing their work, so here is the original abstract:
The packaging of DNA into nucleosomes influences the accessibility of underlying regulatory information. Nucleosome occupancy and positioning are best characterized in the budding yeast Saccharomyces cerevisiae, albeit in asynchronous cell populations or on individual promoters such as PHO5 and GAL1–10. Using FAIRE (formaldehydeassisted isolation of regulatory elements) and whole-genome microarrays, we examined changes in nucleosome occupancy throughout the mitotic cell cycle in synchronized populations of S. cerevisiae. Perhaps surprisingly, nucleosome occupancy did not exhibit large, global variation between cell cycle phases. However, nucleosome occupancy at the promoters of cell cycle–regulated genes was reduced specifically at the cell cycle phase in which that gene exhibited peak expression, with the notable exception of S-phase genes. We present data that establish FAIRE as a high-throughput method for assaying nucleosome occupancy. For the first time in any system, nucleosome occupancy was mapped genome-wide throughout the cell cycle. Fluctuation of nucleosome occupancy at promoters of most cell cycle–regulated genes provides independent evidence that periodic expression of these genes is controlled mainly at the level of transcription. The promoters of G2/M genes are distinguished from other cell cycle promoters by an unusually low baseline nucleosome occupancy throughout the cell cycle. This observation, coupled with the maintenance throughout the cell cycle of the stereotypic nucleosome occupancy states between coding and noncoding loci, suggests that the largest component of variation in nucleosome occupancy is ‘‘hard wired,’’ perhaps at the level of DNA sequence.
Although the authors clearly and very reasonably focus on nucleosome assembly at the promoter regions, their microarray experiments include also probes for the open reading frames (ORFs). I thus decided to take a look at whether nucleosome occupancy within the ORFs correlates with the transcriptional regulation of genes.
I first downloaded the complete dataset from the Gene Expression Omnibus (GEO) database and used a previously described algorithm to identify genes with periodic variation in nucleosome occupancy. The algorithm provides two p-values that tell if a profile varies significantly across time points (“regulation”) and if this variation correlates significantly with a cosine wave with the period of interest (“periodicity”). Finally, it combines the two p-values into a single score (“combined”).
The genes can be ranked according to any one of these three scores. To test which score makes most biological sense, I benchmarked each of the ranked list against a list of 113 genes that are known from small-scale experiments to be transcriptionally regulated during the budding yeast cell cycle (the B1 list from de Lichtenberg et al. (2005)). Below is a plot showing the fraction of the benchmark identified as function of the number of genes identified as having periodic nucleosome occupancy:
As should be the case, the curve for the combined score lies above the curves for “regulation” and “periodicity” alone, and all three scores enrich for know cell-cycle-regulated genes relative to random expectation (“random”). A clear break in the “combined” curve is observed at rank 300, after which there is essentially no enrichment for known cell-cycle-regulated genes. I thus decided to base my analysis on the top-300 genes according to the combined score.
Comparing this list to the top-600 periodically expressed genes from de Lichtenberg et al. (2005) revealed an overlap of just 45 genes. This is approximately 50% more than expected by random chance and the difference is statistically significant (P < 0.001; Fisher’s exact test).
The next logical thing to do was to check the timing of expression during the cell cycle for the genes with and without periodic nucleosome occupancy. The temporal expression profile of each periodically expressed gene is summarized in a single number, the peak time, which tells when in the cell cycle the gene is maximally expressed. The unit of the peak times are “percent of a cell cycle”; 0% corresponds to the time of cell division and 40% corresponds to S phase (in budding yeast). I plotted the peak-time distributions as histograms:
If the two distributions look similar to you, it is for a good reason: according to the Kolmogorov-Smirnov test there is no significant difference. Not a very exciting result, but perhaps also not too surprising.
If the periodic nucleosome occupancy of cell-cycle-regulated genes has a biological imporantance, then one should expect that time of peak expression of a gene corresponds to the time of minimal nucleosome occupancy. I thus made a scatter plot of the two for the set of 45 genes:
There seems to be no correlation. A possible explanation is that overlap of 45 genes is only 50% more than expected by random chance; in other words only one in three genes, that is 15 genes, can be expected to contribute to the signal (if there is any). Even if the correlation was perfect these genes, the overall correlation would be difficult to detect.
At this point I decided to drop the project. In summary, the benchmark results and the comparison with microarray expression data show that there is a statistically significant correlation between periodic nucleosome occupancy within an ORFs and periodic expression of the corresponding gene. However, the signal is quite weak and it is thus difficult to get much further.
You are most welcome to post good ideas as comments. Alternatively, I will be happy to provide you with all the files if you want to take over the project.