Job: Postdoc position on biomedical text mining

A postdoc position is available in the Cellular Network Biology group at the Novo Nordisk Foundation Center for Protein Research (CPR). The group focuses on network-based analysis of proteins and their posttranslational modifications in the context of cellular signaling and disease. The postdoc will work on development of advanced methods for protein-centric text mining of large corpora, including combining efficient dictionary-based named entity-recognition with conditional random fields and classification based on word embedding. The resulting improved methods will in collaboration with other group members be integrated into STRING and related databases/tools, and the software will be made available under open licenses.

Candidates must hold a PhD degree (or equivalent) within a relevant discipline and have strong programming experience. The successful candidates will have strong qualifications or experience within several of the following areas:

  • Text mining
  • Bioinformatics / computational biology
  • Machine learning
  • Statistical data mining
  • Biomedical ontologies
  • Programming in Python and/or C++

For further details please see the official job advert.


Announcement: Protein Signaling conference

This year I am once again involved in organizing an exclusive conference on protein signaling. There is no registration fee and accommodation is also free; all you have to pay yourself is your travel expenses.


Click the image to see the poster in full size.

This year we are fortunate to once again have an amazing lineup of invited speakers: Albert Heck, Anne-Claude Gavin, Bernd Bodenmiller, Brenda Schulman, Daniel Durocher, Gianni Cesareni, Giulio Superti-Furga, Ileana Cristea, Ivan Dickic, James Ferrell, Jason Chin, Jiri Lukas, Julio Saez-Rodriguez, Marc Kirschner, Matthias Mann, Nevan Krogan, Niels Mailand, Oskar Fernandez-Capetillo, Ray Deshaies, Ronald Hay, Steve Jackson, Søren Brunak, Titia Sixma, and Wade Harper.

Please note that although the poster says July 1, the application deadline is in fact June 20, which is only four days from now. To apply, please see the conference website.

Tip: Getting data into Impactstory but not your ORCID profile

I use Impactstory to track altmetrics for my publications. I believe they did the right thing by not asking me to maintain yet another online profile and instead building upon existing infrastructure. I also use figshare to publish open datasets and wanted to get Impactstory to track these too.

As is often the case, configuring the complex software infrastructure to do what I wanted was nontrivial. Because Impactstory does not maintain a separate user profile, the only way to get data into Impactstory is to add it to your ORCID profile. This took me much longer to figure out than it should have due to misleading, outdated documentation. In fact, I would not have succeeded without the help from Martin Fenner, who taught me about the missing piece of the puzzle, which is DataCite. Once you know what to do, it is reasonably simple: from your ORCID profile, you choose to add and link works from DataCite, where a simple search should find all your datasets in figshare. Afterwards, you have to change the privacy settings for them in your ORCID profile, and they will show up in Impactstory.

This still left me with one challenge, though. Uploading a dataset on figshare can result in numerous DOIs, since both the dataset per se and the individual files are assigned DOIs, each with and without a version number in the DOI. Since I cannot control which DOIs people cite, I must associate all of them with my ORCID to get accurate tracking in Impactstory. However, doing so will pollute the list of works on my public ORCID profile with hundreds of partially redundant DOIs of data files. This is problematic because my ORCID profile also serves as an online curriculum vitae, with the list of works being the publication list.

It turns out that there is a small inaccuracy in the Impactstory documentation: when they write that you must change the privacy setting to “public”, they mean “public” or “trusted parties”. Therein lies the solution to my problem: I set privacy to “public” for the works that I want shown on my ORCID profile and “trusted parties” for the works that I only want tracked in Impactstory. That allows me to get all the data into Impactstory via ORCID but at the same time have a clean, non-redundant publication list on my ORCID profile.

Update: I fear the trick described in the last paragraph does not work afterall. Since writing this blog post, works with privacy set to “trusted parties” have disappeared from my Impactstory profile. We really need a solution to this. For now I prioritize having a clean, non-redundant publication list on my ORCID profile over having accurate statistics on Impactstory.

Announcement: NetBioSIG call for abstracts

Two days before the main ISMB 2016 conference in Florida, the Network Biology Special Interests Group (NetBioSIG) meeting will take place. It is a great opportunity to meet up with experts in the field, so I hope to see you there. This years NetBioSIG will have four keynotes given by Olga Troyanskaya, Franca Fraternali, Nataša Pržulj, and yours sincerely.

There is of course also a chance for you to present your own work. However, please note that the abstract submission deadline is Friday, April 29. Please see the NetBioSIG website for more details.

Job: Postdoc in computational analysis of animal disease models

In collaboration with Jan Gorodkin at the Center for non-coding RNA in Technology and Health at University of Copenhagen, I will be starting up a project on cross-species network and pathway analysis of animal disease models. We have secured funding for the project and are now searching for the right person to fill a postdoc position.

The application deadline is February 27, 2016. For further details, including how to apply, please refer to the official job announcement.

Resource: Cytoscape App for STRING

The STRING database of known and predicted protein–protein interactions is a heavily used resource by bioinformaticians and non-bioinformaticians alike. The former generally use STRING via its web interface, whereas the latter typically download the complete network and analyze it locally. However, we lacked a good way for non-bioinformaticians to work with networks that are just too large for the web interface. A typical example of this would be users, who wish to visualize the results of a proteomics or transcriptomics study as a STRING network.

To address this, I have worked with John “Scooter” Morris to develop a new Cytoscape app for STRING. The app allows you to quickly retrieve much larger networks than is possible via the web interface and gives you the powerful layout and analysis features of Cytoscape. At the same time, it retains the “glass ball” look that many people associate with a STRING network (shown here with a small example network):


When retrieving network, the app also includes node attributes from the COMPARTMENTS and TISSUES databases. This allows users to easily, for example, color the nodes based on the confidence with which each protein is localized to a certain cellular compartment or expressed in a certain tissue. The app also includes node attributes for drug targets classification of human proteins, which are obtained from the Pharos web resource. Finally, since it is Cytoscape, you can obviously import your own attributes table.


Although it is not yet feature complete, version 0.9 of the app is already available from the Cytoscape App Store under the name stringApp. Please note that it requires Cytoscape 3.3 to work.