Tag Archives: expression

Resource: The TISSUES database on tissue expression of genes and proteins

As mentioned in the last entry, 2015 has been a year of publishing web resources for my group. The COMPARTMENTS and DISEASES databases have yet another sister resource, namely TISSUES.

This web resource allows users to easily obtain a color-coded schematic of the tissue expression of a protein of interest, providing an at-a-glance overview of evidence from database annotations, from proteomics and transcriptomics studies as well as from automatic text mining of the scientific literature:


Whereas the resource integrates all of the above-mentioned types of evidence, the focus in this work was primarily on combining data from systematic tissue expression atlases, produced using a variety of different high-throughput assays. This required extensive work on mapping, scoring, and benchmarking the different datasets to put them on a common confidence scale. The scientific results and details of all those analyses can be found in the article “Comprehensive comparison of large-scale tissue expression datasets”.

Analysis: Three-dimensional DNA structure

A few months ago Bill Noble’s lab at University of Washington published a letter in Nature on a three-dimensional model of the complete nuclear genome of budding yeast:

A three-dimensional model of the yeast genome

Layered on top of information conveyed by DNA sequence and chromatin are higher order structures that encompass portions of chromosomes, entire chromosomes, and even whole genomes. Interphase chromosomes are not positioned randomly within the nucleus, but instead adopt preferred conformations. Disparate DNA elements co-localize into functionally defined aggregates or ‘factories’ for transcription and DNA replication. In budding yeast, Drosophila and many other eukaryotes, chromosomes adopt a Rabl configuration, with arms extending from centromeres adjacent to the spindle pole body to telomeres that abut the nuclear envelope. Nonetheless, the topologies and spatial relationships of chromosomes remain poorly understood. Here we developed a method to globally capture intra- and inter-chromosomal interactions, and applied it to generate a map at kilobase resolution of the haploid genome of Saccharomyces cerevisiae. The map recapitulates known features of genome organization, thereby validating the method, and identifies new features. Extensive regional and higher order folding of individual chromosomes is observed. Chromosome XII exhibits a striking conformation that implicates the nucleolus as a formidable barrier to interaction between DNA sequences at either end. Inter-chromosomal contacts are anchored by centromeres and include interactions among transfer RNA genes, among origins of early DNA replication and among sites where chromosomal breakpoints occur. Finally, we constructed a three-dimensional model of the yeast genome. Our findings provide a glimpse of the interface between the form and function of a eukaryotic genome.

Having previously worked with predicted 3D structure of DNA, such as intrinsic curvature, I was intrigued by the availability of a 3D structure of a complete eukaryotic genome. Based on past analyses of 1D distances in DNA, I expected that the 3D distance between two genes in the genome would correlate with expression, protein interactions, and metabolic pathways.

To test if 3D neighborhood correlates with function and/or regulation, I collected three large sets of protein pairs, namely pairs of co-expressed genes from the STRING database (Pearson correlation coefficient >0.7), interacting protein pairs from the BioGRID database, and pairs of genes assigned to the same pathway by the KEGG database. I subsequently mapped these onto the set of 3D neighbors listed in the supplementary information of the paper, including only 3D neighbors on different chromosomes (in order to eliminate correlations caused by 1D rather than 3D distance). I also mapped the three sets of gene pairs onto a shuffled version of the 3D neighbors, in order to estimate the overlaps that can be expected at random. The results are summarized in the table below:

3D neighbors Shuffled neighbors
Coexpressed (STRING) 58 61
Interacting (BioGRID) 2151 2122
Same pathway (KEGG) 357 344

To make a long story short, the numbers show that 3D genomic neighbors appear to be no more likely to be coexpressed, to interact, or to be involved in the same pathway than random pairs. It could be that they way I perform the analysis is too simplistic or that the data are too noisy to show a signal. However, it is also possible that the 3D structural organization of the genome simply doesn’t have much impact on gene regulation and function.

Resource: STRING v8.1

After months of hard work from the entire STRING team – thanks everyone –  I am pleased to be able to say that STRING v8.1 has now been put into production. Here is a screen shot of the start page:

STRING 8.1 start page

This is a minor release of STRING, which means that the imported databases of microarray expression data, protein interactions, genetic interactions, and pathways as well as text-mining evidence have all been updated. We have also fixed a bug that affected the minority of bacteria that have multiple chromosomes.

Another notable feature of STRING v8.1 is the new interactive network viewer that is implemented in Adobe Flash:

STRING 8.1 network viewer

For further details please see the post on the official STRING/STITCH blog.

WebCiteCite this post

Analysis: Transcriptional and posttranslational regulation of cell-cycle kinases

Daub and coworkers from Matthias Mann’s group recently published a paper in Molecular Cell, describing a phosphoproteomics study of kinases during S and M phase of the mitotic cell cycle:

Kinase-selective enrichment enables quantitative phosphoproteomics of the kinome across the cell cycle.

Protein kinases are pivotal regulators of cell signaling that modulate each other’s functions and activities through site-specific phosphorylation events. These key regulatory modifications have not been studied comprehensively, because low cellular abundance of kinases has resulted in their underrepresentation in previous phosphoproteome studies. Here, we combine kinase-selective affinity purification with quantitative mass spectrometry to analyze the cell-cycle regulation of protein kinases. This proteomics approach enabled us to quantify 219 protein kinases from S and M phase-arrested human cancer cells. We identified more than 1000 phosphorylation sites on protein kinases. Intriguingly, half of all kinase phosphopeptides were upregulated in mitosis. Our data reveal numerous unknown M phase-induced phosphorylation sites on kinases with established mitotic functions. We also find potential phosphorylation networks involving many protein kinases not previously implicated in mitotic progression. These results provide a vastly extended knowledge base for functional studies on kinases and their regulation through site-specific phosphorylation.

In the study, they identified phosphorylation sites for 219 protein kinases, of which 159 showed differential phosphorylation (at least two-fold induction for at least one site) in S and/or M phase.

My collaborators at CBS and I have previously shown that transcriptional and posttranslational regulation (for example, phosphorylation by cyclin-dependent kinases) tend to target the same proteins (de Lichtenberg et al., 2005; Jensen et al., 2006). One should thus expect that the differentially regulated kinases have a tendency to be encoded by periodically expressed genes.

To test this hypothesis, I compared the phosphoproteomics data of Daub et al. to the cell-cycle microarray expression study by Whitfield et al. (2002). I was able to map 132 of the 159 kinases to the microarrays and found that 17 of them are encoded by the top-600 cycling genes. This corresponds to a significant (P < 0.001) two-fold overrepresentation of transcriptional cell-cycle regulation among the genes encoding kinases that are differentially phosphorylated during S and/or M phase.

One could imagine that this trend is not specific to kinases that are differentially phosphorylated during the cell cycle, but that it instead applies to kinases in general. To test this, I also mapped the 60 non-modulated kinases found by Daub et al. to the microarrays (Whitfield et al., 2002). Of the 54 kinases that could be mapped, only 3 are encoded by periodically expressed genes, which is almost exactly what is expected by random chance.

I next examined if timing of phosphorylation correlates with the timing of expression of the 17 kinases mentioned above. The kinases can be divided into three classes: phosphorylated in S phase, phosphorylated in M phase, and phosphorylated in both S and M phase. Notably, 13 of the 17 kinases fall in to the M phase class. Looking at the peak times of expression for these (that is when in the cell-cycle the corresponding mRNAs are most highly expressed) reveals that 8 of the 13 kinases are presumably synthesized in M phase only shortly before they become phosphorylated.

In summary, comparison of the phosphoproteomics data from Daub et al. (2008) and the microarray expression data from Whitfield et al. (2002) supports the view that transcriptional and posttranslational regulation tend to target the same proteins during the mitotic cell cycle. Moreover, it shows that for most of the kinases that are subject to such dual cell-cycle control, both expression and phosphorylation takes place during M phase when the cyclin-dependent kinase activity is maximal.

Full disclosure: I currently collaborate with Matthias Mann and members of his group, and we will soon be colleagues a the Novo Nordisk Foundation Center for Protein Research.

WebCiteCite this post

Analysis: Cell-cycle expression of cancer genes

I have long used a data integration approach to obtain a global picture of eukaryotic cell-cycle regulation. The cell cycle is a popular research topic in part because of its importance for cancer research. I thus recently compared microarray expression data on the human cell cycle to genes with mutations that have been causally implicated in various forms of cancer.

From the Cancer Genome Project website, I downloaded a list of 353 human genes that are implicated in cancer. Using the identifier mapping file from STRING, I was able to automatically map 338 of these genes to the set of human genes from Ensembl that I used in earlier cell-cycle studies. 295 of the 338 genes were present on the microarrays used in the cell-cycle expression study by Whitfield et al. (2002). However, only 23 of these are among the 600 periodically expressed genes identified in the reanalysis by Jensen et al. (2006). The many numbers are illustrated in the diagram below:

By random chance, 295*600/12097 = 15 of the 295 genes would be expected to be periodically expressed, and the enrichment is thus only a bit over 1.5 fold. Although this enrichment is statistically significantly (P < 3%, Fisher’s exact test), the correlation is clearly not strong enough to allow prediction of novel cancer genes.

My step was to look at the evolutionary conservation of the 23 periodically expressed cancer genes. Only 12 of them belong to an orthologous group. Half of them do thus not appear to have orthologs in budding yeast, fission yeast, or Arabidopsis thaliana. Only three periodically expressed cancer genes have orthologs in all of these organisms. One of these genes is periodically expressed onlt in human, one in human and fission yeast, and one in all four organisms (a histone subunit).

In summary, it seems that one cannot say much about cancer based on cell-cycle mRNA expression data. This is perhaps not surprising considering that the transcriptional regulation does not seem to vary much between cancer cells and normal cells.

WebCiteCite this post

Analysis: Cancer or not, cell-cycle expression stays the same

The groups of Ziv Bar-Joseph and Itamar Simon recently published a paper in PNAS on a new microarray study of the cell cycle of primary human fibroblasts:

Genome-wide transcriptional analysis of the human cell cycle identifies genes differentially regulated in normal and cancer cells

Characterization of the transcriptional regulatory network of the normal cell cycle is essential for understanding the perturbations that lead to cancer. However, the complete set of cycling genes in primary cells has not yet been identified. Here, we report the results of genome-wide expression profiling experiments on synchronized primary human foreskin fibroblasts across the cell cycle. Using a combined experimental and computational approach to deconvolve measured expression values into ‘‘single-cell’’ expression profiles, we were able to overcome the limitations inherent in synchronizing nontransformed mammalian cells. This allowed us to identify 480 periodically expressed genes in primary human foreskin fibroblasts. Analysis of the reconstructed primary cell profiles and comparison with published expression datasets from synchronized transformed cells reveals a large number of genes that cycle exclusively in primary cells. This conclusion was supported by both bioinformatic analysis and experiments performed on other cell types. We suggest that this approach will help pinpoint genetic elements contributing to normal cell growth and cellular transformation.

In contrast to the earlier study by Whitfield et al. (2002), which was performed on HeLa cells, Ziv Bar-Joseph et al. worked on non-transformed fibroblasts. The dataset thus offers a first global view of the differences between the cell cycle of normal human cells and that of cancer cells.

To compare their list of cell-cycle-regulated human genes to the one the I have used so far, I mapped their 480 genes to Ensembl using the mapping file from the STRING database. This resulted in a list of 410 genes, that is 70 genes could not be mapped by the automatic procedure. Whereas this is far from a perfect mapping, it is sufficient to judge the quality of the list.

The plots below show the fraction of a benchmark set that is identified as function of the number of genes that is proposed to be periodically expressed during the cell cycle. In each plot, I compare the results for the list of 410 obtained from the new study by Bar-Joseph et al., the analysis by Whitfield et al., and the reanalysis of the latter dataset by Jensen et al. (2006) (available from Cyclebase.org). To make the comparison as fair as possible, I only considered the subset of genes that were present in both microarray designs. The first plot uses as benchmark a set of 63 genes that have been identified as periodically expressed in targeted small-scale studies:

Three sets of cell-cycle-regulated human genes compared to benchmark set B1

I also benchmarked the three gene lists against a second benchmark set, which consists of predicted target genes of E2F cell-cycle transcription factors:

Three sets of cell-cycle-regulated human genes compared to benchmark set B2

Both benchmarks suggest that the three lists are of very comparable quality, but that the list by Whitfield and coworkers is much more inclusive than the one from Bar-Joseph and coworkers. In other words, the former list has better sensitivity whereas the latter has better specificity. This is consistent with the results presented by Bar-Joseph et al., who conclude that their list is more reliable than the previously published list. However, this is probably not due to better quality of the raw expression data, since reanalysis of the data by Whitfield et al. yielded a list with almost identical sensitivity and specificity (that is the red curve is very close to the blue cross in both plots).

Although the two lists of periodically expressed are of comparable quality, they may still contain very different sets of genes. I therefore decided to compare the list of genes that are periodically expressed in the time course on primary fibroblasts and in each of the four time courses on HeLa cells. To make this comparison as easy as possible, I selected the top-364 cycling genes from each of the four HeLa time courses based on the reanalysis by Jensen et al. (2006). The ten Venn diagrams below show all pairwise comparisons of the five lists of 364 genes each:

The average overlap between the list by Bar-Joseph et al. and an experiment from Whitfield et al. is 114 genes. By comparison, the average overlap between the top-364 lists from two individual experiments from Whitfield et al. is 123 genes. Although the overlap may seem low, I thus believe that it is due to the poor reproducibility between microarray time courses rather than due to genuine differences between primary fibroblasts and HeLa cells as suggested by Bar-Joseph and colleagues.

Although cancer cells have to circumvent the regulatory mechanisms that would normally prevent cell proliferation, the cell cycle itself appears to function the same way as in normal cells. In other words, the difference does not lie in the “engine” but in the “brakes”, which have been sabotaged in cancer cells.

WebCiteCite this post