Tag Archives: networks

Announcement: Protein Signaling conference

This year I am once again involved in organizing an exclusive conference on protein signaling. There is no registration fee and accommodation is also free; all you have to pay yourself is your travel expenses.


Click the image to see the poster in full size.

This year we are fortunate to once again have an amazing lineup of invited speakers: Albert Heck, Anne-Claude Gavin, Bernd Bodenmiller, Brenda Schulman, Daniel Durocher, Gianni Cesareni, Giulio Superti-Furga, Ileana Cristea, Ivan Dickic, James Ferrell, Jason Chin, Jiri Lukas, Julio Saez-Rodriguez, Marc Kirschner, Matthias Mann, Nevan Krogan, Niels Mailand, Oskar Fernandez-Capetillo, Ray Deshaies, Ronald Hay, Steve Jackson, Søren Brunak, Titia Sixma, and Wade Harper.

Please note that although the poster says July 1, the application deadline is in fact June 20, which is only four days from now. To apply, please see the conference website.

Announcement: NetBioSIG call for abstracts

Two days before the main ISMB 2016 conference in Florida, the Network Biology Special Interests Group (NetBioSIG) meeting will take place. It is a great opportunity to meet up with experts in the field, so I hope to see you there. This years NetBioSIG will have four keynotes given by Olga Troyanskaya, Franca Fraternali, Nataša Pržulj, and yours sincerely.

There is of course also a chance for you to present your own work. However, please note that the abstract submission deadline is Friday, April 29. Please see the NetBioSIG website for more details.

Job: Postdoc in computational analysis of animal disease models

In collaboration with Jan Gorodkin at the Center for non-coding RNA in Technology and Health at University of Copenhagen, I will be starting up a project on cross-species network and pathway analysis of animal disease models. We have secured funding for the project and are now searching for the right person to fill a postdoc position.

The application deadline is February 27, 2016. For further details, including how to apply, please refer to the official job announcement.

Resource: Cytoscape App for STRING

The STRING database of known and predicted protein–protein interactions is a heavily used resource by bioinformaticians and non-bioinformaticians alike. The former generally use STRING via its web interface, whereas the latter typically download the complete network and analyze it locally. However, we lacked a good way for non-bioinformaticians to work with networks that are just too large for the web interface. A typical example of this would be users, who wish to visualize the results of a proteomics or transcriptomics study as a STRING network.

To address this, I have worked with John “Scooter” Morris to develop a new Cytoscape app for STRING. The app allows you to quickly retrieve much larger networks than is possible via the web interface and gives you the powerful layout and analysis features of Cytoscape. At the same time, it retains the “glass ball” look that many people associate with a STRING network (shown here with a small example network):


When retrieving network, the app also includes node attributes from the COMPARTMENTS and TISSUES databases. This allows users to easily, for example, color the nodes based on the confidence with which each protein is localized to a certain cellular compartment or expressed in a certain tissue. The app also includes node attributes for drug targets classification of human proteins, which are obtained from the Pharos web resource. Finally, since it is Cytoscape, you can obviously import your own attributes table.


Although it is not yet feature complete, version 0.9 of the app is already available from the Cytoscape App Store under the name stringApp. Please note that it requires Cytoscape 3.3 to work.

Announcement: EMBO practical course on computational biology in Heidelberg

June 2016 will likely be a highly productive month for people in my group, since I will not be there much to disturb them. Specifically, I will be involved in running two week-long EMBO practical courses.

One was announced on this blog just two days ago. The other is the also long-running course “Computational biology: Genomes to systems”, which this year will take place on June 19–23 at the European Molecular Biology Laboratory in Heidelberg, Germany. The course will cover a wide range of advanced computational biology topics, including protein networks (taught by STRING collaborator Christian von Mering) and biomedical text mining (taught by me).

Please note that the application deadline is less than a month away, namely on January 31.

More details can be found on .

Announcement: EMBO practical course on protein interaction analysis in Budapest

Later this year, I will once again be one of the teachers on the long-running EMBO practical course “Computational analysis of protein-protein interactions: Sequences, networks and diseases”. The 2016 version of the course will be taking place on May 30 – June 4 in Budapest, Hungary, and the application deadline is February 1.

For more details see the course website or the poster below.


Exercise: Web services

The aim of this practical is to introduce you to the concept of web services as well as to a few useful standard command-line tools and how one can pipe data from one tool into another. Web services are, simply put, websites that are meant to be used by computers rather than humans.

Fetching a URL from the command line

The previous exercises used this article to illustrate named entity recognition. If you want to work with it outside the web browser, you will want to change two things: 1) you will probably not want to work with an HTML web page, but rather retrieve it in XML format, and 2) you will want to retrieve the article with something else than a web browser:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML'

Submitting text to the tagger

In the NER practical, you used the a web service for NER; however, the complexity was hidden from you in the EXTRACT bookmarklet. The way the bookmarklet works, is that it sends text from your web browser to a remove tagging web service and subsequently displays the results.

Let us start by looking behind the curtain and see how an EXTRACT popup is produced. When selecting the the header of the article and clicking the bookmarklet, your browser retrieves the following page to show in the popup:


As you can see, the URL contains data, namely the text to be tagged as well as information on which types of named entities we want to have recognized in the text.

You can retrieve the same information in a tab-delimited format, which is far more useful for computational purposes:


If you want, you can use the curl command to retrieve the same data from the command line.

Retrieving a protein network

Bioinformatics web services are not limited to text mining. For example, the STRING database of protein interactions can also be accessed as a web service. The following URL gives you an interaction network for BCL11B as an image:


Modifying it just slightly, allows you to retrieve the same interactions in PSI-MI-TAB format:

You obtain the exact same data in the command line by running this command:

curl 'http://string-db.org/api/psi-mi-tab/interactions?identifier=ENSP00000349723'

Putting it all together

Using pipes, it is possible to put together multiple different web services and local programs to accomplish complex tasks. Here is an example that puts together everything you have learned above:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML' | curl --data-urlencode 'document@-' --data-urlencode 'entity_types=9606' --data-urlencode 'format=tsv' 'http://tagger.jensenlab.org/GetEntities' | cut -f3 | sort -u | grep '^ENSP' | curl --data-urlencode 'identifiers@-' --data-urlencode 'limit=0' 'http://string-db.org/api/psi-mi-tab/interactionsList' > string_network.tsv

Let us pick apart this monstrosity of a command and see what it does:

  • The first curl command fetches a full-text article from PLOS ONE in XML format
  • The second curl command submits this document to the tagger REST web service, to perform named entity recognition of human genes/proteins
  • The cut command pulls out only column three from the resulting output, which contains the identifiers of the recognized entities
  • The grep command find only the identifiers that start with “ENSP”, which is the proteins
  • The third curl command submits this list of protein identifiers to the STRING database to retrieve a protein interaction network of them in PSI-MI-TAB format
  • Finally, we put that network into a file called string_network.tsv on our server.

In other words, with a single pipe of commands that interacts with three different servers we manage to retrieve a full-text article, perform named entity recognition of human proteins and obtain protein interactions among them. Note that whereas this is possible, it will often be desirable to store some of the intermediate results in files instead of using pipes.

By slightly modifying the command, it is possible to instead retrieve this as an image:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML' | curl --data-urlencode 'document@-' --data-urlencode 'entity_types=9606' --data-urlencode 'format=tsv' 'http://tagger.jensenlab.org/GetEntities' | cut -f3 | sort -u | grep '^ENSP' | curl --data-urlencode 'identifiers@-' --data-urlencode 'limit=0' --data-urlencode 'network_flavor=confidence' 'http://string-db.org/api/image/networkList' > string_network.png

STRING network