Category Archives: Commentary

Commentary: On large protein complexes and the essentiality of hubs

In 2001, Jeong and coworkers published a paper in Nature in which they showed that the central proteins in interaction networks, that is the proteins with the highest connectivity, are enriched for essential proteins. This publication has been highly influential as evidenced by the numerous subsequent publications on the importance of “hub” proteins. Several hypothesis have been published that try to explain why hubs are essential, for example that certain protein interactions are essential and that a protein with many interactions is thus more likely to be involved in at least one essential interaction (He and Zhang, 2006).

Yesterday, Zotenko and coworkers published a paper in PLoS Computational Biology in which they take a closer look at the cause of this phenomenon:

Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality.

The centrality-lethality rule, which notes that high-degree nodes in a protein interaction network tend to correspond to proteins that are essential, suggests that the topological prominence of a protein in a protein interaction network may be a good predictor of its biological importance. Even though the correlation between degree and essentiality was confirmed by many independent studies, the reason for this correlation remains illusive. Several hypotheses about putative connections between essentiality of hubs and the topology of protein-protein interaction networks have been proposed, but as we demonstrate, these explanations are not supported by the properties of protein interaction networks. To identify the main topological determinant of essentiality and to provide a biological explanation for the connection between the network topology and essentiality, we performed a rigorous analysis of six variants of the genomewide protein interaction network for Saccharomyces cerevisiae obtained using different techniques. We demonstrated that the majority of hubs are essential due to their involvement in Essential Complex Biological Modules, a group of densely connected proteins with shared biological function that are enriched in essential proteins. Moreover, we rejected two previously proposed explanations for the centrality-lethality rule, one relating the essentiality of hubs to their role in the overall network connectivity and another relying on the recently published essential protein interactions model.

What Zotenko et al. show is, in other words, that essential hubs tend to be highly connected with each other and hence form large “Essential Complex Biological Modules”. Table 7 in their paper lists the Gene Ontology terms associated with these modules; among the recurring themes are “rRNA metabolic process”, “mRNA metabolic process”, “RNA splicing”, “ribosome biogenesis and assembly”, and “proteolysis”. These Gene Ontology terms obviously correspond to well known protein complexes, namely the RNA polymerases, the spliceosome, the ribosome, and the proteoasome. The analysis of Zotenko et al. thus suggests that the much debated correlation between centrality and essentiality is simply a consequence of the fact that many of the large protein complexes in a eukaryotic cell are essential, which is hardly surprising considering that they have been conserved through more than two billion years of evolution (Brocks et al., 1999).

Edit: For more views on the results of Zotenko et al. see the discussion on FriendFeed.

WebCiteCite this post

Commentary: Open access equals bulk publishing?

This week Nature published a News piece by Declan Butler with the rather provocative title “PLoS stays afloat with bulk publishing”. Unsurprisingly, this caused a backlash from open-access advocates in general and science bloggers in particular. Jonathan Eisen posted the ironic response “Only Nature could turn the success of PLoS One into a model of failure”. For an overview of the many other responses from the blogosphere see the summary by Coturnix and the long debate on FriendFeed.

The core of the criticism by Declan Butler was directed against the business model of the Public Library of Science (PLoS), in particular that a large part of their total income is produced by “bulk publishing” in the “database” PLoS ONE with only “light” peer review. There is no point in denying that PLoS ONE is a major source of income for PLoS, that it publishes many papers, and that it is not a top-tier journal. Still, it is in my view an unnecessary provocation to refer to a journal from a competitor as a “database” and between the lines suggest that they do not perform proper peer review.

I have nothing against Nature Publishing Group (NPG) – they are in my view one of the more progressive publishers with initiative such as Connotea and Nature Network. However, I find the criticism by Declan Butler somewhat unfair, especially considering that NPG also has a considerable number of lower impact journals in their portfolio in addition to their lineup of Nature journals. To illustrate this point, I looked up the impact factors for all the PLoS and NPG journals that I could find (6 and 68, respectively) and plotted the distributions:

The average impact factors of the two publishers are remarkably similar 9.19 for PLoS and 9.39 for NPG, but the underlying distributions are very different. Notably, the high average impact factor of NPG’s journals is due to a fairly small number of journals with impact factors over 20, which are sufficient to offset the large number of journals with impact factors below 5. Consequently, the median impact factors are 9.03 for PLoS and only 4.88 for NPG.

I want to be the first to point out the caveats of this analysis. First, the analysis above did not take into account that each journal does not publish the same number of papers. However, weighting the journals by number of papers when calculating average impact factors shifts the balance in favor of PLoS (9.79 for PLoS vs. 9.46 for NPG). Second, the journal PLoS ONE does not have an impact factor yet and was thus not included in my analysis. Third, the criticism by Declan Butler was mainly targeting the fact that much of PLoS’ revenue is due to PLoS ONE. However, until NPG chooses to make available detailed financial reports like PLoS does, it is impossible to tell how much of their revenue comes from lower-impact journals.

That being said, the business models of PLoS and NPG do not look all that different based on bibliographic metrics alone.

Full disclosure: I am an associate editor of PLoS Computational Biology.

WebCiteCite this post

Commentary: Summarizing papers as word clouds

For use in presentations on literature mining, I did a back-of-the-envelope calculation of how much time I would be able to spend on each new biomedical paper that is published. Assuming that all papers were indexed in PubMed (which they are not) and that I could read papers 24 hours per day all year around (which I cannot), the result is that I could allocate approximately 50 seconds per paper. This nicely illustrates the point that no one can keep up with the complete biomedical literature.

When I discovered Wordle, which can turn any text into a beautiful word cloud, I thus wondered if this visualization method would be useful for summarizing a complete paper as a single figure. To test this, I extracted the complete text of three papers that I coauthored in the NAR database issue 2008. Submitting these to Wordle resulted in the three figures below (click for larger versions):

All in all, I think that Wordle does a pretty good job at capturing the essence of each paper: the first cloud shows that STITCH is a database of interactions between proteins and chemicals, the second cloud shows that NetworKIN is a database predictions related to the kinases and phosphorylation, and the third cloud shows that is a database of experiments on gene expression during the cell cycle. However, a paper describing a database might be easier to summarize that a typical research paper.

As a final test, I therefore submitted the complete text from my paper “Evolution of Cell Cycle Control – Same molecular machines, different regulation”, which describes the somewhat complex concept of just-in-time assembly to Wordle (click for larger version):

The result is rather less impressive than for the papers from the NAR database issue. Although the word cloud does contain a good selection of words, it fails to convey the main message. I think a large part of the problem is the splitting of multiwords; for example, “cell cycle” becomes two separate terms “cell” and “cycle”. Another problem is that words from different sections of the paper are mixed, which blurs the messages. These two issues could be solved by 1) detecting multiwords and considering them as single tokens, and 2) sorting the terms according to where in the paper they are mainly used.

WebCiteCite this post

Commentary: Does size matter?

I recently took a look at colonization of titles and found that the fraction of papers with colons in their titles is increasing steadily. Intuitively, one would thus expect that the average length of the titles has also increased. The plot below shows that this is indeed the case (not that the y-axis does not begin at zero):

The average title length has increased from 8.5 words in 1950 to 12.5 words in 2008. Strangely, the increase is almost perfectly linear except for a fluctuation in the early 60s – I have no idea why this is the case.

But is the title length of a paper important? I personally expected that papers with short, catchy titles would be cited more than papers with longer, more complex titles. Lacking citation information for individual publications, I thus calculated average title length for publications from each journal and correlated it with the ISI impact factor of the corresponding journal:

No correlation is observed between the impact factor of a journal and the average title length of the papers published therein. So we can conclude that – at least for titles of scientific papers – size does not matter.

WebCiteCite this post

Commentary: Colonization of titles

You have probably noticed that a high fraction of scientific papers have colons in their titles. Several people have written humorous commentaries on this. Although these authors clearly see the use of colons as a growing trend, they did not present hard evidence for the increase in the usage of colons in the titles of scientific publications.

Out of curiosity, I thus wrote a small script to count the fraction of papers in Medline that have colons in their titles for each of the past 25 years. The result is shown in the plot below (note that the y-axis does not start at zero):

The conclusion is very clear: the fraction of titles with colons has increased linearly from 15% to 24% over the past 20 years. One could object that this effect may be explained by the increase in apologies (which often have a title “Retraction: …”) or by the NAR special issues on databases and web servers (which contain hundreds papers with titles such as “YADB: yet another database”). However, these add up to less than 2% of the papers with colonized titles and are thus insufficient to explain the observed 9% increase.

WebCiteCite this post

Commentary: Viewing the cell cycle in a new light

Atsushi Miyawaki’s lab from RIKEN has recently published a Cell paper that describes a novel approach for how to monitor cell-cycle progression of individual cells:

Visualizing spatiotemporal dynamics of multicellular cell-cycle progression

The cell-cycle transition from G1 to S phase has been difficult to visualize. We have harnessed antiphase oscillating proteins that mark cell-cycle transitions in order to develop genetically encoded fluorescent probes for this purpose. These probes effectively label individual G1 phase nuclei red and those in S/G2/M phases green. We were able to generate cultured cells and transgenic mice constitutively expressing the cell-cycle probes, in which every cell nucleus exhibits either red or green fluorescence. We performed time-lapse imaging to explore the spatiotemporal patterns of cell-cycle dynamics during the epithelial-mesenchymal transition of cultured cells, the migration and differentiation of neural progenitors in brain slices, and the development of tumors across blood vessels in live mice. These mice and cell lines will serve as model systems permitting unprecedented spatial and temporal resolution to help us better understand how the cell cycle is coordinated with various biological events.

The clever idea was to fuse a red- and a green-emitting fluorescent protein to Cdt1 and Geminin, respectively. Cdt1 is ubiquitinated by SCFSkp2 at the onset of S phase, which causes it to be rapidly degraded by the proteasome, whereas Geminin is targeted for proteolytic degradation by APCCdh1 in late M phase. By fluorescent labeling of two proteins, Miyawaki and colleagues managed to make mouse cells that become increasingly red during G1 phase, yellow around the G1/S transition, and increasingly green through S, G2, and M phase. It is thus possible to monitor the cell-cycle states of individual cells with a microscope.

The movie below follows a few HeLa cells for 3-4 cell cycles:

The authors also show how their construct can be used for imaging the cell-cycle state of the cells in a slice of a mouse brain or a mouse embryo. I expect that this will become an indispensable tool for unraveling the links between cell-cycle control and developmental processes.

For more details, I strongly recommend that you read Jake Young’s post at Pure Pedantry.

WebCiteCite this post

Commentary: Much ado about alignments

There seems to be a new trend in computational biology: worrying about sequence alignments. Over the past couple of months, two high-profile papers have appeared that flaws related to sequence alignment methods.

The first paper appeared in Science Magazine in January this year. Wong and coworkers describe how uncertainties in multiple alignments can lead to errors in different phylogenetic trees:

Alignment Uncertainty and Genomic Analysis

The statistical methods applied to the analysis of genomic data do not account for uncertainty in the sequence alignment. Indeed, the alignment is treated as an observation, and all of the subsequent inferences depend on the alignment being correct. This may not have been too problematic for many phylogenetic studies, in which the gene is carefully chosen for, among other things, ease of alignment. However, in a comparative genomics study, the same statistical methods are applied repeatedly on thousands of genes, many of which will be difficult to align. Using genomic data from seven yeast species, we show that uncertainty in the alignment can lead to several problems, including different alignment methods resulting in different conclusions.

The second paper appeared in Nature Biotechnology. Styczynski and coworkers discovered that the most commonly used substitution matrix, BLOSUM62, was calculated wrongly:

BLOSUM62 miscalculations improve search performance

The BLOSUM family of substitution matrices, and particularly BLOSUM62, is the de facto standard in protein database searches and sequence alignments. In the course of analyzing the evolution of the Blocks database, we noticed errors in the software source code used to create the initial BLOSUM family of matrices (available online). The result of these errors is that the BLOSUM matrices — BLOSUM62, BLOSUM50, etc. — are quite different from the matrices that should have been calculated using the algorithm described by Henikoff and Henikoff. Obviously, minor errors in research, and particularly in software source code, are quite common. This case is noteworthy for three reasons: first, the BLOSUM matrices are ubiquitous in computational biology; second, these errors have gone unnoticed for 15 years; and third, the ‘incorrect’ matrices perform better than the ‘intended’ matrices.

Upon casual reading of these publications, one could get the idea that over a decade of work based on alignments, sequence similarity searches, and molecular evolution is wrong. Fortunately, this does not appear to be the case.

Starting with the second paper, I applaud the authors for discovering a mistake in such an established method, and I agree with them that it is remarkable that it has not been noticed before. However, I do not think that it is surprising that the ‘incorrect’ matrices work very well. Although they were not calculated as intended, the BLOSUM matrices have become the de facto standard precisely because they work as well as they do.

Regarding the first paper, I think it is fair to say that anyone working on multiple alignments and phylogeny are well aware that uncertain alignments can lead to wrong phylogenetic trees. This is why almost everyone uses programs like Gblocks to remove the ambiguous parts of their alignments before moving on to constructing phylogenetic trees. Unfortunately, Wong et al. instead constructed two sets of trees for each of the six multiple alignment methods: one based on the complete alignments, and one in which they excluded all gapped sites from the phylogenetic analysis. The latter is not equivalent to using a blocked alignment, since not all ambiguously aligned sites contain gaps, and since not all sites with gaps are ambiguously aligned.

Wong and coworkers subsequently compared the trees that they obtained using the six different alignment programs and found disagreements for almost half of all yeast proteins. This number may sound shockingly high, but I find it to be misleading in several ways. First, “disagreement” was defined as at least one of the six trees disagreeing with the others – much of the disagreement could thus be due to a single poorly performing alignment program. This definition also implies that the results can only get worse by adding more alignment methods to the comparison. Second, the comparison was not limited to the trees that are supported by bootstrap analysis – much of the disagreement is thus due to trees that we already know should not be trusted.

In my view, it would be more fair to make the comparison along the following lines:

  • Align the sequences as done by Wong et al.
  • Remove ambiguously aligned sites with Gblocks
  • Construct phylogenetic trees based on the blocked alignments
  • Calculate the bootstrap support for each tree
  • Discard trees with poor bootstrap support
  • Calculate the agreement on tree topology for each pair of alignment methods

This procedure will ensure that trees are not distorted by the unreliable parts of the alignments, that comparisons are not based on trees we know are unreliable, that the results are not skewed by a single poorly performing alignment method, and that the numbers remain comparable if more alignment methods are added. I have already downloaded all the alignments and run then through Gblocks; please let me know if you would like to continue the analysis from that step, and I will arrange a way to transfer the files.

Time might prove me wrong, but I expect that such an analysis will show that alignment uncertainty is not a major factor that needs to be taken into account when constructing phylogenetic trees.

WebCiteCite this post

Commentary: We apologize

Attila Chordash over at “PIMM – Partial immortalization” discovered that Proteomics have now changed the abstract of the infamous paper by Warda and Han to be an apology to their readership:

Proteomics apologizes

While I am pleased to see this public apology from the publisher, the retraction is still only based on “a substantial overlap of the content of this article with previously published articles in other journals”. That is a euphemism for “the authors copied four entire pages of text from sources that were not cited”. However, I am concerned that this apology – like the press release from Proteomics – ignores the central question: how did the manuscript make it through peer review?

I was a bit surprised to see an apology being published via PubMed, but a quick search revealed that Proteomics is far from the only journal to apologize to their readers in this way. In fact, a systematic count of the abstracts mentioning the words “apologise(s)” or “apologize(s)” has increased exponentially over the past decade (note the logarithmic scale):

Exponential increase in the number of apologies

The number shown for 2008 is an extrapolation based on the first six weeks; if the apologies keep coming at the current rate, there will be 32 by the end of the year. The line shows an exponential fit of the data points from 1999 to 2007. The doubling time for the number of apologies is just 3 years whereas the number of papers doubles only every 22 years. If these trends continue, there will be more apologies than papers published from the year 2067 and onwards. I apologize for the extrapolation.

WebCiteCite this post

Commentary: Neither buried nor treasure

This post might be considered off topic since it is about a paper that is unfortunately neither buried nor treasure. I would rather describe it as “organic fertilizer that has come into contact with a rotary air-circulation device”.

The paper that I will dissect is “Mitochondria, the missing link between body and soul: Proteomic prospective evidence” by Mohamad Warda and Jin Han. This review is at the time of writing published in electronic form by the journal Proteomics (ISI impact factor 5.735). It was aptly described as “A baffling failure of peer review” by PZ Myers on his blog Pharyngula, which led to a flash mob of researchers (including me) quickly identifying several flaws, any one of which should in my view be sufficient to cause the journal to retract the paper:

  • Warda and Han twice suggest that mitochondria provide a link between body and soul, but they never provide any argument for this.
  • They claim to present data that disproves the accepted endosymbiotic theory for the origin of mitochondria. But in reality they present no such evidence.
  • They promise to replace this theory by “a more realistic alternative”. The alternative turns out to be “a mighty creator”, or in other words Intelligent Design.
  • To support these truly remarkable claims, the authors misrepresent the results of cited references. Some of these references are even completely unrelated to the topic at hand.
  • Entire sections or paragraphs of the paper are plagiarism of other researcher’s papers and from the website of another group. Not only is this material not presented as quotations, the sources are not even cited.
  • Finally, numerous sentences have been copied verbatim from the cited sources. This may be partly excused by the authors borrowing better English, but I nonetheless consider it an unacceptable practice especially in reviews.

Note how I use block quotations below to show which parts are not my own words. That is what Warda and Han should have done to avoid their biggest problem: being accused of plagiarism. Let us start by looking at the first sentence of the abstract:

Mitochondria are the gatekeepers of the life and death of most cells that regulate signaling, metabolism, and energy production needed for cellular function.

This sentence is identical to the first sentence on the webpage of a competing group, namely the Mitochondrial Research & Innovation Group at University of Rochester Medical Center. The rest of the paragraph from the webpage can be found later in the paper by Warda and Han:

Recent scientific studies show that mitochondrial dysfunction is more commonplace than previously thought and that substantial mitochondrial involvement is present in many acute and chronic diseases. Mitochondrial dysfunction is now implicated in a range of human diseases, including aging, diabetes, atherosclerosis, heart failure, myocardial infarction, stroke and other ischemic-reperfusion injuries, neurodegenerative diseases including Alzhiemer’s and Parkinson’s diseases; cancer, HIV; sepsis and trauma with multiorgan dysfunction or failure. Some rare mitochondria diseases (e.g., MELAS, Kearns-Sayre) are associated with large deletions in the mitochondrial genome. More recently, the so-called OXPHOS diseases that reflect a limited capacity to produce the energy needed to respond to normal stress conditions.

However, most of the plagiarized material is not from webpages; it is from peer-reviewed papers of other researchers. For example, the two paragraph below appear to originate from the paper “Peroxisome Proliferator-Activated Receptor gamma Coactivator-1 (PGC-1) Regulatory Cascade in Cardiac Physiology and Disease” published by Brian N. Finck and Daniel P. Kelly in the journal Circulation:

Emerging evidence supports the notion that derangements in mitochondrial energy metabolism contribute to cardiac dysfunction [186]. For example, human mitochondrial DNA mutations resulting in global impairment in mitochondrial respiratory function cause hypertrophic or dilated cardiomyopathy and cardiac conduction defects [187, 188]. Mutations in nuclear genes encoding mitochondrial fatty acid oxidation enzymes may also manifest as cardiomyopathy [189, 190]. Interestingly, cardiomyopathies resulting from inborn errors in mitochondrial fatty acid oxidation enzymes are often provoked by physiological or pathophysiological conditions that increase dependence on fat oxidation for myocardial ATP production such as prolonged exercise or fasting associated with infectious illness [190,191].

A causal relationship between mitochondrial dysfunction and cardiomyopathy is also evidenced by several genetically engineered mouse models. Targeted deletion of the adenine nucleotide translocator 1, which transports mitochondrially generated ATP to the cytosol, leads to mitochondrial dysfunction and cardiomyopathy [192]. Mice with cardiac-specific deletion of the transcription factor of activated mitochondria, which controls transcription and replication of the mitochondrial genome, also exhibit marked impairments in mitochondrial metabolism, severe cardiomyopathy, and premature mortality [193]. Cardiomyopathy and/or conduction defects are also observed in several mouse models with targeted deletion of specific fatty acid oxidation enzymes [194, 195].

I discovered these plagiarized sections myself, but they only scratch the surface and pale in comparison to the amount of copied material identified by others. I should make clear that I do not blame the reviewers for not discovering this; their job is to check the scientific quality of the material presented, not to detect fraudulent or plagiarized material.

The editor and the reviewers are not off the hook, though. Interspersed between the sensible review material, much of which has been copied from elsewhere, there are a few sections that are “a mélange of truths, half-truths, quarter-truths, falsehoods, non sequiturs, and syntactically correct sentences that have no meaning whatsoever” (to use the words of Alan Sokal).

The first of these is the following sentence from the abstract:

These data are presented with other novel proteomics evidence to disprove the endosymbiotic hypothesis of mitochondrial evolution that is replaced in this work by a more realistic alternative.

Clearly the editors and the reviewers should have examined the evidence for such an exceptional claim. The “evidence” that supposedly disproves the serial endosymbiotic theory (SET) of mitochondrial evolution is presented in section 3.4, which after explaining the theory makes the following baffling statement:

The proof of SET was based on the parallel connection between plant mitochondrial and phage T4 genome replication [107, 108] …

This is simply not true. First, the scientific method can never prove a theory. Second, the similarity between the replication of mitochondrial and phage T4 genomes is completely irrelevant with respect to the endosymbiotic origin of mitochondria. Third, reference 108 is about chloroplasts and not mitochondria.

Next, the authors try to convince the reader that people are still debating the validity of the endosymbiotic theory:

Therefore, the debates concerning the mitochondrial endosymbiotic hypothesis recently terminated with many questions still left unanswered [111].

Reference 111 is a paper in Cellular Immunology by Gray and coworkers with the title “Modulation of CD8+ T cell avidity by increasing the turnover of viral antigen during infection”. It has nothing to do with the evolution of mitochondria.

The authors go on to recite the well known facts that the vast majority of mitochondrial proteins are encoded by nuclear genes and that most bacterial genes do not match within the mitochondrial DNA. They then give a long description of how tightly integrated mitochondria are with the rest of the cell and use the tired old argument of irreducible complexity to dismiss the endosymbiotic theory. Needless to say, the authors have what they consider to be a more realistic explanation:

Alternatively, instead of sinking in a swamp of endless debates about the evolution of mitochondria, it is better to come up with a unified assumption that all living cells undergo a certain degree of convergence or divergence to or from each other to meet their survival in specific habitats. Proteomics data greatly assist this realistic assumption that connects all kinds of life. More logically, the points that show proteomics overlapping between different forms of life are more likely to be interpreted as a reflection of a single common fingerprint initiated by a mighty creator than relying on a single cell that is, in a doubtful way, surprisingly originating all other kinds of life.

In other words: “God did it”. If I can read correctly, the authors here reject not only the endosymbiotic origin of mitochondria but also the common ancestry of eukaryotes. And as if this was not enough, the authors end the paper with the following bold conclusion that also explains the mysterious title of the paper:

We realize so far that mitochondria could be the link between the body and this preserved wisdom of the soul devoted to guaranteeing life.

I am speechless. As anyone who knows me can attest, that very rarely happens.

Edit: The paper has now been retracted, but there are still many open questions as to how it got accepted in the first place.

WebCiteCite this post

Commentary: Does just-in-time assembly of protein complexes explain phenotypes?

Beginning of this year Ben Lehner’s lab published a beautiful study in BMC Systems Biology with the title “A simple principle concerning the robustness of protein complex activity to changes in gene expression”. The abstract reads:


The functions of a eukaryotic cell are largely performed by multi-subunit protein complexes that act as molecular machines or information processing modules in cellular networks. An important problem in systems biology is to understand how, in general, these molecular machines respond to perturbations.


In yeast, genes that inhibit growth when their expression is reduced are strongly enriched amongst the subunits of multi-subunit protein complexes. This applies to both the core and peripheral subunits of protein complexes, and the subunits of each complex normally have the same loss-of-function phenotypes. In contrast, genes that inhibit growth when their expression is increased are not enriched amongst the core or peripheral subunits of protein complexes, and the behaviour of one subunit of a complex is not predictive for the other subunits with respect to over-expression phenotypes.


We propose the principle that the overall activity of a protein complex is in general robust to an increase, but not to a decrease in the expression of its subunits. This means that whereas phenotypes resulting from a decrease in gene expression can be predicted because they cluster on networks of protein complexes, over-expression phenotypes cannot be predicted in this way. We discuss the implications of these findings for understanding how cells are regulated, how they evolve, and how genetic perturbations connect to disease in humans.

It struck me that these observations can all be explained by the just-in-time assembly model for temporal regulation of protein complex assembly, which I developed together with members of Søren Brunak’s group. For a long explanation and discussion of the model see our paper “Evolution of Cell Cycle Control: Same Molecular Machines, Different Regulation”. For the short version see the figure below, which shows how cell-cycle regulation of just a single subunit is sufficient to control when during the cell cycle a complex is active (click to enlarge):

The just-in-time assembly hypothesis

What will happen if you knock down the expression of one subunit of a complex? The maximal number of complete complexes that can be assembled will be reduced, irrespective of whether the subunit is dynamic or static. Whether this results in a given phenotype depends on the function of the complex. However, the effect should in principle be the same for different subunits of the same complex, which is exactly what Lehner and coworkers observed.

What if you instead overexpress one subunit of a complex? For a static subunit it should not really matter; the maximal number of complete complexes that can be assembled is unchanged. On the other hand, overexpression of a dynamic subunit may cause the complex to become constitutively active, which could have disastrous consequences for the cell. Overexpression of dynamic and static subunits of the same complex should thus give rise to different phenotypic effects. This would explain the observation by Lehner and coworkers that subunits of the same complex often have different overexpression phenotypes.

If this hypothesis is true, genes that lead to phenotypic effects when overexpressed should preferentially encode dynamic proteins, i.e. many of the genes should be periodically expressed. In fact, this correlation between overexpression phenotype and cell-cycle regulation was already described by the Hughes, Boone and Andrews labs who originally published the dataset on overexpression phenotypes (for details see their paper in Molecular Cell):

Genes expressed periodically during the cell cycle (de Lichtenberg et al., 2005) were more likely to show an overexpression phenotype (p = 0.017), and in particular, this tended to cause abnormal morphology [p < 10-13] or cell cycle arrest [p < 10-14](Table S3). When the analysis is limited to genes known to function in the mitotic cell cycle, we still find that overexpression of periodically expressed genes is more likely to cause cell cycle arrest (p = 0.008) or abnormal morphology (p = 0.006) than constitutively expressed cell cycle genes (Table S3), indicating that unscheduled expression of genes that are usually expressed periodically often leads to toxicity.

The results of the two papers thus point in the direction that the just-in-time assembly hypothesis can explain the qualitatively differences between knock-down and overexpression phenotypes.

WebCiteCite this post