Tag Archives: networks

Announcement: Protein Signaling conference

This year I am once again involved in organizing an exclusive conference on protein signaling. There is no registration fee and accommodation is also free; all you have to pay yourself is your travel expenses.

CBC10

Click the image to see the poster in full size.

This year we are fortunate to once again have an amazing lineup of invited speakers: Albert Heck, Anne-Claude Gavin, Bernd Bodenmiller, Brenda Schulman, Daniel Durocher, Gianni Cesareni, Giulio Superti-Furga, Ileana Cristea, Ivan Dickic, James Ferrell, Jason Chin, Jiri Lukas, Julio Saez-Rodriguez, Marc Kirschner, Matthias Mann, Nevan Krogan, Niels Mailand, Oskar Fernandez-Capetillo, Ray Deshaies, Ronald Hay, Steve Jackson, Søren Brunak, Titia Sixma, and Wade Harper.

Please note that although the poster says July 1, the application deadline is in fact June 20, which is only four days from now. To apply, please see the conference website.

Announcement: NetBioSIG call for abstracts

Two days before the main ISMB 2016 conference in Florida, the Network Biology Special Interests Group (NetBioSIG) meeting will take place. It is a great opportunity to meet up with experts in the field, so I hope to see you there. This years NetBioSIG will have four keynotes given by Olga Troyanskaya, Franca Fraternali, Nataša Pržulj, and yours sincerely.

There is of course also a chance for you to present your own work. However, please note that the abstract submission deadline is Friday, April 29. Please see the NetBioSIG website for more details.

Job: Postdoc in computational analysis of animal disease models

In collaboration with Jan Gorodkin at the Center for non-coding RNA in Technology and Health at University of Copenhagen, I will be starting up a project on cross-species network and pathway analysis of animal disease models. We have secured funding for the project and are now searching for the right person to fill a postdoc position.

The application deadline is February 27, 2016. For further details, including how to apply, please refer to the official job announcement.

Resource: Cytoscape App for STRING

The STRING database of known and predicted protein–protein interactions is a heavily used resource by bioinformaticians and non-bioinformaticians alike. The former generally use STRING via its web interface, whereas the latter typically download the complete network and analyze it locally. However, we lacked a good way for non-bioinformaticians to work with networks that are just too large for the web interface. A typical example of this would be users, who wish to visualize the results of a proteomics or transcriptomics study as a STRING network.

To address this, I have worked with John “Scooter” Morris to develop a new Cytoscape app for STRING. The app allows you to quickly retrieve much larger networks than is possible via the web interface and gives you the powerful layout and analysis features of Cytoscape. At the same time, it retains the “glass ball” look that many people associate with a STRING network (shown here with a small example network):

cytoscape1

When retrieving network, the app also includes node attributes from the COMPARTMENTS and TISSUES databases. This allows users to easily, for example, color the nodes based on the confidence with which each protein is localized to a certain cellular compartment or expressed in a certain tissue. The app also includes node attributes for drug targets classification of human proteins, which are obtained from the Pharos web resource. Finally, since it is Cytoscape, you can obviously import your own attributes table.

cytoscape2

Although it is not yet feature complete, version 0.9 of the app is already available from the Cytoscape App Store under the name stringApp. Please note that it requires Cytoscape 3.3 to work.

Announcement: EMBO practical course on computational biology in Heidelberg

June 2016 will likely be a highly productive month for people in my group, since I will not be there much to disturb them. Specifically, I will be involved in running two week-long EMBO practical courses.

One was announced on this blog just two days ago. The other is the also long-running course “Computational biology: Genomes to systems”, which this year will take place on June 19–23 at the European Molecular Biology Laboratory in Heidelberg, Germany. The course will cover a wide range of advanced computational biology topics, including protein networks (taught by STRING collaborator Christian von Mering) and biomedical text mining (taught by me).

Please note that the application deadline is less than a month away, namely on January 31.

More details can be found on .

Announcement: EMBO practical course on protein interaction analysis in Budapest

Later this year, I will once again be one of the teachers on the long-running EMBO practical course “Computational analysis of protein-protein interactions: Sequences, networks and diseases”. The 2016 version of the course will be taking place on May 30 – June 4 in Budapest, Hungary, and the application deadline is February 1.

For more details see the course website or the poster below.

16-protein-protein

Exercise: Web services

The aim of this practical is to introduce you to the concept of web services as well as to a few useful standard command-line tools and how one can pipe data from one tool into another. Web services are, simply put, websites that are meant to be used by computers rather than humans.

Fetching a URL from the command line

The previous exercises used this article to illustrate named entity recognition. If you want to work with it outside the web browser, you will want to change two things: 1) you will probably not want to work with an HTML web page, but rather retrieve it in XML format, and 2) you will want to retrieve the article with something else than a web browser:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML'

Submitting text to the tagger

In the NER practical, you used the a web service for NER; however, the complexity was hidden from you in the EXTRACT bookmarklet. The way the bookmarklet works, is that it sends text from your web browser to a remove tagging web service and subsequently displays the results.

Let us start by looking behind the curtain and see how an EXTRACT popup is produced. When selecting the the header of the article and clicking the bookmarklet, your browser retrieves the following page to show in the popup:

http://tagger.jensenlab.org/Extract?document=Novel%20ZEB2-BCL11B%20Fusion%20Gene%20Identified%20by%20RNA-Sequencing%20in%20Acute%20Myeloid%20Leukemia%20with%20t(2;14)(q22;q32)&entity_types=9606%20-26

As you can see, the URL contains data, namely the text to be tagged as well as information on which types of named entities we want to have recognized in the text.

You can retrieve the same information in a tab-delimited format, which is far more useful for computational purposes:

http://tagger.jensenlab.org/GetEntities?document=Novel%20ZEB2-BCL11B%20Fusion%20Gene%20Identified%20by%20RNA-Sequencing%20in%20Acute%20Myeloid%20Leukemia%20with%20t(2;14)(q22;q32)&entity_types=9606%20-26&format=tsv

If you want, you can use the curl command to retrieve the same data from the command line.

Retrieving a protein network

Bioinformatics web services are not limited to text mining. For example, the STRING database of protein interactions can also be accessed as a web service. The following URL gives you an interaction network for BCL11B as an image:

http://string-db.org/api/image/network?identifier=ENSP00000349723

Modifying it just slightly, allows you to retrieve the same interactions in PSI-MI-TAB format:

http://string-db.org/api/psi-mi-tab/interactions?identifier=ENSP00000349723
You obtain the exact same data in the command line by running this command:

curl 'http://string-db.org/api/psi-mi-tab/interactions?identifier=ENSP00000349723'

Putting it all together

Using pipes, it is possible to put together multiple different web services and local programs to accomplish complex tasks. Here is an example that puts together everything you have learned above:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML' | curl --data-urlencode 'document@-' --data-urlencode 'entity_types=9606' --data-urlencode 'format=tsv' 'http://tagger.jensenlab.org/GetEntities' | cut -f3 | sort -u | grep '^ENSP' | curl --data-urlencode 'identifiers@-' --data-urlencode 'limit=0' 'http://string-db.org/api/psi-mi-tab/interactionsList' > string_network.tsv

Let us pick apart this monstrosity of a command and see what it does:

  • The first curl command fetches a full-text article from PLOS ONE in XML format
  • The second curl command submits this document to the tagger REST web service, to perform named entity recognition of human genes/proteins
  • The cut command pulls out only column three from the resulting output, which contains the identifiers of the recognized entities
  • The grep command find only the identifiers that start with “ENSP”, which is the proteins
  • The third curl command submits this list of protein identifiers to the STRING database to retrieve a protein interaction network of them in PSI-MI-TAB format
  • Finally, we put that network into a file called string_network.tsv on our server.

In other words, with a single pipe of commands that interacts with three different servers we manage to retrieve a full-text article, perform named entity recognition of human proteins and obtain protein interactions among them. Note that whereas this is possible, it will often be desirable to store some of the intermediate results in files instead of using pipes.

By slightly modifying the command, it is possible to instead retrieve this as an image:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML' | curl --data-urlencode 'document@-' --data-urlencode 'entity_types=9606' --data-urlencode 'format=tsv' 'http://tagger.jensenlab.org/GetEntities' | cut -f3 | sort -u | grep '^ENSP' | curl --data-urlencode 'identifiers@-' --data-urlencode 'limit=0' --data-urlencode 'network_flavor=confidence' 'http://string-db.org/api/image/networkList' > string_network.png

STRING network

Announcement: EMBO practical course on protein interaction analysis in South Africa

I very much look forward to once again be part of the team of teachers behind the EMBO practical course “Computational analysis of protein-protein interactions: From sequences to networks”. This time it will for the first time take place on the African continent, more specifically in Cape Town, South Africa. The course will take place from September 23 – October 3 and the application deadline is July 23.

Please check the course website or the poster below for details.

Course poster

Resource: Antibodypedia bulk download file and STRING payload

Antibodypedia is a very useful resource for finding commercially available antibodies against human proteins developed by Antibodypedia AB and Nature Publishing Group.

The resource is made available under the Creative Commons Attribution-NonCommercial 3.0 license, which allows for reuse and redistribution of the data for non-commercial purposes. However, the data are purely available for browsing through a web interface, which greatly limits systems biology uses of the resource. I thus wrote a robot to scrape all information from the web resource and convert it into a convenient tab-delimited file, which I have made available for download under the same license. This dataset covers a total of 579,038 antibodies against 16,827 human proteins.

To be able to use the dataset in conjunction with STRING and related resources, I next mapped the proteins to STRING protein identifiers. I was able to map 92% of all proteins in Antibodypedia. Having done this, I created the necessary files for the STRING payload mechanism to be able to show the information from Antibodypedia directly within STRING.

The end result looks like this when searching for the WNT7A protein:

Antibodypedia STRING network

The halos around the proteins encode the type and number of antibodies available. Red rings imply that at least one monoclonal antibody exists whereas gray rings imply that only polyclonal antibodies exist. The darker the ring (be it red or gray), the more different antibodies are available.

They STRING payload mechanism also extends the popups with additional information, here shown for LRP6:

Antibodypedia STRING popup

The popup shows the total number of antibodies available and how many of them are monoclonal. It also provides a direct linkout to the relevant protein page on Antibodypedia.

Please, feel free to use this Antibodypedia-STRING mashup.

Announcement: From genomes to cells and systems

Later this year Peer Bork, Jeroen Raes, Roland Krause, David Torrents, and I will be organizing the EMBO practical course “Computational biology: From genomes to cells and systems”. It will take place October 14-20 in L’Escala Girona, Catalonia.

In times when high-throughput data are the norm rather than the exception, computational skills to turn masses of data into tangible biological insights have become crucial. This course will teach advanced computational methods for analysis of high-throughput data in molecular biology, covering both inter-individual and inter-species variation in (meta-)genomes and linking it to clinical applications. The course will span protein and pathway level variation from single genomes to entire microbial communities.

To participate in this course, fill in the online application form at the latest July 31, 2012. The registration fee is 250 euros for participants from academia, and 600 euros for industry.