Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry
We present a strategy for the analysis of the yeast phosphoproteome that uses endo-Lys C as the proteolytic enzyme, immobilized metal affinity chromatography for phosphopeptide enrichment, a 90-min nanoflow-HPLC/electrospray-ionization MS/MS experiment for phosphopeptide fractionation and detection, gas phase ion/ion chemistry, electron transfer dissociation for peptide fragmentation, and the Open Mass Spectrometry Search Algorithm for phosphoprotein identification and assignment of phosphorylation sites. From a 30-microg (approximately 600 pmol) sample of total yeast protein, we identify 1,252 phosphorylation sites on 629 proteins. Identified phosphoproteins have expression levels that range from <50 to 1,200,000 copies per cell and are encoded by genes involved in a wide variety of cellular processes. We identify a consensus site that likely represents a motif for one or more uncharacterized kinases and show that yeast kinases, themselves, contain a disproportionately large number of phosphorylation sites. Detection of a pHis containing peptide from the yeast protein, Cdc10, suggests an unexpected role for histidine phosphorylation in septin biology. From diverse functional genomics data, we show that phosphoproteins have a higher number of interactions than an average protein and interact with each other more than with a random protein. They are also likely to be conserved across large evolutionary distances.
As is so often the case with experimental papers, no comparison is provided to earlier studies. I thus decided to compare the set of phosphoproteins identified by Hunt and coworkers to the set of Cdc28p substrates identified in two studies by the Morgan lab as well as to the proteome-wide, sequence-based predictions made by NetPhosK:
The Venn diagram obviously shows that each of the three sets contains a considerable number of phosphoproteins that are not present in any of the other sets. This was to be expected since the three methods are fundamentally very different. The dataset from the Hunt lab includes proteins that are phosphorylated by other kinases than Cdc28p; however, it is limited in the sense that low-abundance phosphopeptides are typically missed by MS studies. Conversely, the set from the Morgan lab consists only of Cdc28p substrates, but is likely to have much better coverage of low-abundance phosphoproteins. Finally, the set of Cdc28 substrates from NetPhosK is likely to contain a considerable number of false positives as they are predicted from the protein sequence alone.
As a matter of fact, I find the overlap between the three sets to be surprisingly good. Even if we assume that the dataset from the Morgan lab contains no false positives, the overlap suggests that the new dataset from Hunt and coworkers captures one third of all phosphoproteins in budding yeast; assuming errors in both datasets increases this estimate. It is also noteworthy that NetPhosK misses only 22% of the Cdc28p that were identified by the Morgan lab and supported by the new data from the Hunt lab, although this high coverage is probably obtained at the price of many false positive predictions.