2015 has been an exceptionally busy year in my group in terms of publishing databases and other web resources; so busy that I have failed to write blog posts describing several of them.
One of them is the DISEASES database, which is described in detail in an article with the informative, if not very inventive title “DISEASES: Text mining and data integration of disease–gene associations”.
The DISEASES database can be viewed as a sister resource to the subcellular localization database COMPARTMENTS, which you can read more about in this blog post. Indeed, the two resources share much of their infrastructure, including the web framework, the backend database, and the text-mining pipeline.
The big difference between the two resources is the scope: whereas COMPARTMENTS links proteins to their subcellular localizations, DISEASES links them to the diseases in which they are implicated. To this end we make use of the Disease Ontology, which turned out to be very well suited for text-mining purposes due to its many synonyms for terms. Text mining is the most important source of associations but is complemented by manually curated associations from Genetics Home Reference and UniProtKB as well as GWAS results imported from DistiLD.
To facilitate usage in large-scale analysis and integration into other databases, all data in DISEASES are available for download. Indeed, the text-mined associations from DISEASES are already included in both GeneCards and Pharos.