Monthly Archives: February 2008

Commentary: Neither buried nor treasure

This post might be considered off topic since it is about a paper that is unfortunately neither buried nor treasure. I would rather describe it as “organic fertilizer that has come into contact with a rotary air-circulation device”.

The paper that I will dissect is “Mitochondria, the missing link between body and soul: Proteomic prospective evidence” by Mohamad Warda and Jin Han. This review is at the time of writing published in electronic form by the journal Proteomics (ISI impact factor 5.735). It was aptly described as “A baffling failure of peer review” by PZ Myers on his blog Pharyngula, which led to a flash mob of researchers (including me) quickly identifying several flaws, any one of which should in my view be sufficient to cause the journal to retract the paper:

  • Warda and Han twice suggest that mitochondria provide a link between body and soul, but they never provide any argument for this.
  • They claim to present data that disproves the accepted endosymbiotic theory for the origin of mitochondria. But in reality they present no such evidence.
  • They promise to replace this theory by “a more realistic alternative”. The alternative turns out to be “a mighty creator”, or in other words Intelligent Design.
  • To support these truly remarkable claims, the authors misrepresent the results of cited references. Some of these references are even completely unrelated to the topic at hand.
  • Entire sections or paragraphs of the paper are plagiarism of other researcher’s papers and from the website of another group. Not only is this material not presented as quotations, the sources are not even cited.
  • Finally, numerous sentences have been copied verbatim from the cited sources. This may be partly excused by the authors borrowing better English, but I nonetheless consider it an unacceptable practice especially in reviews.

Note how I use block quotations below to show which parts are not my own words. That is what Warda and Han should have done to avoid their biggest problem: being accused of plagiarism. Let us start by looking at the first sentence of the abstract:

Mitochondria are the gatekeepers of the life and death of most cells that regulate signaling, metabolism, and energy production needed for cellular function.

This sentence is identical to the first sentence on the webpage of a competing group, namely the Mitochondrial Research & Innovation Group at University of Rochester Medical Center. The rest of the paragraph from the webpage can be found later in the paper by Warda and Han:

Recent scientific studies show that mitochondrial dysfunction is more commonplace than previously thought and that substantial mitochondrial involvement is present in many acute and chronic diseases. Mitochondrial dysfunction is now implicated in a range of human diseases, including aging, diabetes, atherosclerosis, heart failure, myocardial infarction, stroke and other ischemic-reperfusion injuries, neurodegenerative diseases including Alzhiemer’s and Parkinson’s diseases; cancer, HIV; sepsis and trauma with multiorgan dysfunction or failure. Some rare mitochondria diseases (e.g., MELAS, Kearns-Sayre) are associated with large deletions in the mitochondrial genome. More recently, the so-called OXPHOS diseases that reflect a limited capacity to produce the energy needed to respond to normal stress conditions.

However, most of the plagiarized material is not from webpages; it is from peer-reviewed papers of other researchers. For example, the two paragraph below appear to originate from the paper “Peroxisome Proliferator-Activated Receptor gamma Coactivator-1 (PGC-1) Regulatory Cascade in Cardiac Physiology and Disease” published by Brian N. Finck and Daniel P. Kelly in the journal Circulation:

Emerging evidence supports the notion that derangements in mitochondrial energy metabolism contribute to cardiac dysfunction [186]. For example, human mitochondrial DNA mutations resulting in global impairment in mitochondrial respiratory function cause hypertrophic or dilated cardiomyopathy and cardiac conduction defects [187, 188]. Mutations in nuclear genes encoding mitochondrial fatty acid oxidation enzymes may also manifest as cardiomyopathy [189, 190]. Interestingly, cardiomyopathies resulting from inborn errors in mitochondrial fatty acid oxidation enzymes are often provoked by physiological or pathophysiological conditions that increase dependence on fat oxidation for myocardial ATP production such as prolonged exercise or fasting associated with infectious illness [190,191].

A causal relationship between mitochondrial dysfunction and cardiomyopathy is also evidenced by several genetically engineered mouse models. Targeted deletion of the adenine nucleotide translocator 1, which transports mitochondrially generated ATP to the cytosol, leads to mitochondrial dysfunction and cardiomyopathy [192]. Mice with cardiac-specific deletion of the transcription factor of activated mitochondria, which controls transcription and replication of the mitochondrial genome, also exhibit marked impairments in mitochondrial metabolism, severe cardiomyopathy, and premature mortality [193]. Cardiomyopathy and/or conduction defects are also observed in several mouse models with targeted deletion of specific fatty acid oxidation enzymes [194, 195].

I discovered these plagiarized sections myself, but they only scratch the surface and pale in comparison to the amount of copied material identified by others. I should make clear that I do not blame the reviewers for not discovering this; their job is to check the scientific quality of the material presented, not to detect fraudulent or plagiarized material.

The editor and the reviewers are not off the hook, though. Interspersed between the sensible review material, much of which has been copied from elsewhere, there are a few sections that are “a mélange of truths, half-truths, quarter-truths, falsehoods, non sequiturs, and syntactically correct sentences that have no meaning whatsoever” (to use the words of Alan Sokal).

The first of these is the following sentence from the abstract:

These data are presented with other novel proteomics evidence to disprove the endosymbiotic hypothesis of mitochondrial evolution that is replaced in this work by a more realistic alternative.

Clearly the editors and the reviewers should have examined the evidence for such an exceptional claim. The “evidence” that supposedly disproves the serial endosymbiotic theory (SET) of mitochondrial evolution is presented in section 3.4, which after explaining the theory makes the following baffling statement:

The proof of SET was based on the parallel connection between plant mitochondrial and phage T4 genome replication [107, 108] …

This is simply not true. First, the scientific method can never prove a theory. Second, the similarity between the replication of mitochondrial and phage T4 genomes is completely irrelevant with respect to the endosymbiotic origin of mitochondria. Third, reference 108 is about chloroplasts and not mitochondria.

Next, the authors try to convince the reader that people are still debating the validity of the endosymbiotic theory:

Therefore, the debates concerning the mitochondrial endosymbiotic hypothesis recently terminated with many questions still left unanswered [111].

Reference 111 is a paper in Cellular Immunology by Gray and coworkers with the title “Modulation of CD8+ T cell avidity by increasing the turnover of viral antigen during infection”. It has nothing to do with the evolution of mitochondria.

The authors go on to recite the well known facts that the vast majority of mitochondrial proteins are encoded by nuclear genes and that most bacterial genes do not match within the mitochondrial DNA. They then give a long description of how tightly integrated mitochondria are with the rest of the cell and use the tired old argument of irreducible complexity to dismiss the endosymbiotic theory. Needless to say, the authors have what they consider to be a more realistic explanation:

Alternatively, instead of sinking in a swamp of endless debates about the evolution of mitochondria, it is better to come up with a unified assumption that all living cells undergo a certain degree of convergence or divergence to or from each other to meet their survival in specific habitats. Proteomics data greatly assist this realistic assumption that connects all kinds of life. More logically, the points that show proteomics overlapping between different forms of life are more likely to be interpreted as a reflection of a single common fingerprint initiated by a mighty creator than relying on a single cell that is, in a doubtful way, surprisingly originating all other kinds of life.

In other words: “God did it”. If I can read correctly, the authors here reject not only the endosymbiotic origin of mitochondria but also the common ancestry of eukaryotes. And as if this was not enough, the authors end the paper with the following bold conclusion that also explains the mysterious title of the paper:

We realize so far that mitochondria could be the link between the body and this preserved wisdom of the soul devoted to guaranteeing life.

I am speechless. As anyone who knows me can attest, that very rarely happens.

Edit: The paper has now been retracted, but there are still many open questions as to how it got accepted in the first place.

WebCiteCite this post

Commentary: Does just-in-time assembly of protein complexes explain phenotypes?

Beginning of this year Ben Lehner’s lab published a beautiful study in BMC Systems Biology with the title “A simple principle concerning the robustness of protein complex activity to changes in gene expression”. The abstract reads:


The functions of a eukaryotic cell are largely performed by multi-subunit protein complexes that act as molecular machines or information processing modules in cellular networks. An important problem in systems biology is to understand how, in general, these molecular machines respond to perturbations.


In yeast, genes that inhibit growth when their expression is reduced are strongly enriched amongst the subunits of multi-subunit protein complexes. This applies to both the core and peripheral subunits of protein complexes, and the subunits of each complex normally have the same loss-of-function phenotypes. In contrast, genes that inhibit growth when their expression is increased are not enriched amongst the core or peripheral subunits of protein complexes, and the behaviour of one subunit of a complex is not predictive for the other subunits with respect to over-expression phenotypes.


We propose the principle that the overall activity of a protein complex is in general robust to an increase, but not to a decrease in the expression of its subunits. This means that whereas phenotypes resulting from a decrease in gene expression can be predicted because they cluster on networks of protein complexes, over-expression phenotypes cannot be predicted in this way. We discuss the implications of these findings for understanding how cells are regulated, how they evolve, and how genetic perturbations connect to disease in humans.

It struck me that these observations can all be explained by the just-in-time assembly model for temporal regulation of protein complex assembly, which I developed together with members of Søren Brunak’s group. For a long explanation and discussion of the model see our paper “Evolution of Cell Cycle Control: Same Molecular Machines, Different Regulation”. For the short version see the figure below, which shows how cell-cycle regulation of just a single subunit is sufficient to control when during the cell cycle a complex is active (click to enlarge):

The just-in-time assembly hypothesis

What will happen if you knock down the expression of one subunit of a complex? The maximal number of complete complexes that can be assembled will be reduced, irrespective of whether the subunit is dynamic or static. Whether this results in a given phenotype depends on the function of the complex. However, the effect should in principle be the same for different subunits of the same complex, which is exactly what Lehner and coworkers observed.

What if you instead overexpress one subunit of a complex? For a static subunit it should not really matter; the maximal number of complete complexes that can be assembled is unchanged. On the other hand, overexpression of a dynamic subunit may cause the complex to become constitutively active, which could have disastrous consequences for the cell. Overexpression of dynamic and static subunits of the same complex should thus give rise to different phenotypic effects. This would explain the observation by Lehner and coworkers that subunits of the same complex often have different overexpression phenotypes.

If this hypothesis is true, genes that lead to phenotypic effects when overexpressed should preferentially encode dynamic proteins, i.e. many of the genes should be periodically expressed. In fact, this correlation between overexpression phenotype and cell-cycle regulation was already described by the Hughes, Boone and Andrews labs who originally published the dataset on overexpression phenotypes (for details see their paper in Molecular Cell):

Genes expressed periodically during the cell cycle (de Lichtenberg et al., 2005) were more likely to show an overexpression phenotype (p = 0.017), and in particular, this tended to cause abnormal morphology [p < 10-13] or cell cycle arrest [p < 10-14](Table S3). When the analysis is limited to genes known to function in the mitotic cell cycle, we still find that overexpression of periodically expressed genes is more likely to cause cell cycle arrest (p = 0.008) or abnormal morphology (p = 0.006) than constitutively expressed cell cycle genes (Table S3), indicating that unscheduled expression of genes that are usually expressed periodically often leads to toxicity.

The results of the two papers thus point in the direction that the just-in-time assembly hypothesis can explain the qualitatively differences between knock-down and overexpression phenotypes.

WebCiteCite this post

Analysis: Periodic nucleosome occupancy during the yeast cell cycle

A while back, I found a paper in PLoS Genetics from Jason Lieb’s lab entitled “Cell Cycle-Specified Fluctuation of Nucleosome Occupancy at Gene Promoters”. There is little point in me rephrasing their work, so here is the original abstract:

The packaging of DNA into nucleosomes influences the accessibility of underlying regulatory information. Nucleosome occupancy and positioning are best characterized in the budding yeast Saccharomyces cerevisiae, albeit in asynchronous cell populations or on individual promoters such as PHO5 and GAL1–10. Using FAIRE (formaldehydeassisted isolation of regulatory elements) and whole-genome microarrays, we examined changes in nucleosome occupancy throughout the mitotic cell cycle in synchronized populations of S. cerevisiae. Perhaps surprisingly, nucleosome occupancy did not exhibit large, global variation between cell cycle phases. However, nucleosome occupancy at the promoters of cell cycle–regulated genes was reduced specifically at the cell cycle phase in which that gene exhibited peak expression, with the notable exception of S-phase genes. We present data that establish FAIRE as a high-throughput method for assaying nucleosome occupancy. For the first time in any system, nucleosome occupancy was mapped genome-wide throughout the cell cycle. Fluctuation of nucleosome occupancy at promoters of most cell cycle–regulated genes provides independent evidence that periodic expression of these genes is controlled mainly at the level of transcription. The promoters of G2/M genes are distinguished from other cell cycle promoters by an unusually low baseline nucleosome occupancy throughout the cell cycle. This observation, coupled with the maintenance throughout the cell cycle of the stereotypic nucleosome occupancy states between coding and noncoding loci, suggests that the largest component of variation in nucleosome occupancy is ‘‘hard wired,’’ perhaps at the level of DNA sequence.

Although the authors clearly and very reasonably focus on nucleosome assembly at the promoter regions, their microarray experiments include also probes for the open reading frames (ORFs). I thus decided to take a look at whether nucleosome occupancy within the ORFs correlates with the transcriptional regulation of genes.

I first downloaded the complete dataset from the Gene Expression Omnibus (GEO) database and used a previously described algorithm to identify genes with periodic variation in nucleosome occupancy. The algorithm provides two p-values that tell if a profile varies significantly across time points (“regulation”) and if this variation correlates significantly with a cosine wave with the period of interest (“periodicity”). Finally, it combines the two p-values into a single score (“combined”).

The genes can be ranked according to any one of these three scores. To test which score makes most biological sense, I benchmarked each of the ranked list against a list of 113 genes that are known from small-scale experiments to be transcriptionally regulated during the budding yeast cell cycle (the B1 list from de Lichtenberg et al. (2005)). Below is a plot showing the fraction of the benchmark identified as function of the number of genes identified as having periodic nucleosome occupancy:

Benchmark of periodic nucleosome occupancy vs. known cell-cycle-regulated genes

As should be the case, the curve for the combined score lies above the curves for “regulation” and “periodicity” alone, and all three scores enrich for know cell-cycle-regulated genes relative to random expectation (“random”). A clear break in the “combined” curve is observed at rank 300, after which there is essentially no enrichment for known cell-cycle-regulated genes. I thus decided to base my analysis on the top-300 genes according to the combined score.

Comparing this list to the top-600 periodically expressed genes from de Lichtenberg et al. (2005) revealed an overlap of just 45 genes. This is approximately 50% more than expected by random chance and the difference is statistically significant (P < 0.001; Fisher’s exact test).

The next logical thing to do was to check the timing of expression during the cell cycle for the genes with and without periodic nucleosome occupancy. The temporal expression profile of each periodically expressed gene is summarized in a single number, the peak time, which tells when in the cell cycle the gene is maximally expressed. The unit of the peak times are “percent of a cell cycle”; 0% corresponds to the time of cell division and 40% corresponds to S phase (in budding yeast). I plotted the peak-time distributions as histograms:

Peak-time distribution of of periodically expressed genes with and without periodic nucleosome occupancy

If the two distributions look similar to you, it is for a good reason: according to the Kolmogorov-Smirnov test there is no significant difference. Not a very exciting result, but perhaps also not too surprising.

If the periodic nucleosome occupancy of cell-cycle-regulated genes has a biological imporantance, then one should expect that time of peak expression of a gene corresponds to the time of minimal nucleosome occupancy. I thus made a scatter plot of the two for the set of 45 genes:

Time of maximal expression vs. time of minimal nucleosome occupancy

There seems to be no correlation. A possible explanation is that overlap of 45 genes is only 50% more than expected by random chance; in other words only one in three genes, that is 15 genes, can be expected to contribute to the signal (if there is any). Even if the correlation was perfect these genes, the overall correlation would be difficult to detect.

At this point I decided to drop the project. In summary, the benchmark results and the comparison with microarray expression data show that there is a statistically significant correlation between periodic nucleosome occupancy within an ORFs and periodic expression of the corresponding gene. However, the signal is quite weak and it is thus difficult to get much further.

You are most welcome to post good ideas as comments. Alternatively, I will be happy to provide you with all the files if you want to take over the project.

WebCiteCite this post

Editorial: Why “Buried Treasure”?

I’m a computational biologist who tends to work on way too many projects at any one time. Many of these result in observations that I find interesting – but not interesting enough to bother writing up a manuscript and sending it to a peer-reviewed journal. In other cases the results were simply negative and no conclusions could be drawn. My disk may thus contain “buried treasure”.

My primary goal with this blog is to make my never-to-be-published observations openly available. As I don’t plan on continuing these projects, anyone is welcome to pick up a project and continue where I left off. I also plan on reporting negative results on this blog so that others can hopefully avoid repeating analyses that would lead them nowhere.

Please note that this blog is an experiment. Over the next few months I will try to write up a number of posts on various projects. After a test period, I will make up my mind about whether to continue or not. This depends on how many people read and comment my posts vs. how long it takes me to write the posts. Like for all other experiments, time will tell if it becomes a success.