Tag Archives: protein interactions

Announcement: EMBO practical course on computational biology in Heidelberg

June 2016 will likely be a highly productive month for people in my group, since I will not be there much to disturb them. Specifically, I will be involved in running two week-long EMBO practical courses.

One was announced on this blog just two days ago. The other is the also long-running course “Computational biology: Genomes to systems”, which this year will take place on June 19–23 at the European Molecular Biology Laboratory in Heidelberg, Germany. The course will cover a wide range of advanced computational biology topics, including protein networks (taught by STRING collaborator Christian von Mering) and biomedical text mining (taught by me).

Please note that the application deadline is less than a month away, namely on January 31.

More details can be found on .

Advertisements

Announcement: EMBO practical course on protein interaction analysis in Budapest

Later this year, I will once again be one of the teachers on the long-running EMBO practical course “Computational analysis of protein-protein interactions: Sequences, networks and diseases”. The 2016 version of the course will be taking place on May 30 – June 4 in Budapest, Hungary, and the application deadline is February 1.

For more details see the course website or the poster below.

16-protein-protein

Exercise: Web services

The aim of this practical is to introduce you to the concept of web services as well as to a few useful standard command-line tools and how one can pipe data from one tool into another. Web services are, simply put, websites that are meant to be used by computers rather than humans.

Fetching a URL from the command line

The previous exercises used this article to illustrate named entity recognition. If you want to work with it outside the web browser, you will want to change two things: 1) you will probably not want to work with an HTML web page, but rather retrieve it in XML format, and 2) you will want to retrieve the article with something else than a web browser:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML'

Submitting text to the tagger

In the NER practical, you used the a web service for NER; however, the complexity was hidden from you in the EXTRACT bookmarklet. The way the bookmarklet works, is that it sends text from your web browser to a remove tagging web service and subsequently displays the results.

Let us start by looking behind the curtain and see how an EXTRACT popup is produced. When selecting the the header of the article and clicking the bookmarklet, your browser retrieves the following page to show in the popup:

http://tagger.jensenlab.org/Extract?document=Novel%20ZEB2-BCL11B%20Fusion%20Gene%20Identified%20by%20RNA-Sequencing%20in%20Acute%20Myeloid%20Leukemia%20with%20t(2;14)(q22;q32)&entity_types=9606%20-26

As you can see, the URL contains data, namely the text to be tagged as well as information on which types of named entities we want to have recognized in the text.

You can retrieve the same information in a tab-delimited format, which is far more useful for computational purposes:

http://tagger.jensenlab.org/GetEntities?document=Novel%20ZEB2-BCL11B%20Fusion%20Gene%20Identified%20by%20RNA-Sequencing%20in%20Acute%20Myeloid%20Leukemia%20with%20t(2;14)(q22;q32)&entity_types=9606%20-26&format=tsv

If you want, you can use the curl command to retrieve the same data from the command line.

Retrieving a protein network

Bioinformatics web services are not limited to text mining. For example, the STRING database of protein interactions can also be accessed as a web service. The following URL gives you an interaction network for BCL11B as an image:

http://string-db.org/api/image/network?identifier=ENSP00000349723

Modifying it just slightly, allows you to retrieve the same interactions in PSI-MI-TAB format:

http://string-db.org/api/psi-mi-tab/interactions?identifier=ENSP00000349723
You obtain the exact same data in the command line by running this command:

curl 'http://string-db.org/api/psi-mi-tab/interactions?identifier=ENSP00000349723'

Putting it all together

Using pipes, it is possible to put together multiple different web services and local programs to accomplish complex tasks. Here is an example that puts together everything you have learned above:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML' | curl --data-urlencode 'document@-' --data-urlencode 'entity_types=9606' --data-urlencode 'format=tsv' 'http://tagger.jensenlab.org/GetEntities' | cut -f3 | sort -u | grep '^ENSP' | curl --data-urlencode 'identifiers@-' --data-urlencode 'limit=0' 'http://string-db.org/api/psi-mi-tab/interactionsList' > string_network.tsv

Let us pick apart this monstrosity of a command and see what it does:

  • The first curl command fetches a full-text article from PLOS ONE in XML format
  • The second curl command submits this document to the tagger REST web service, to perform named entity recognition of human genes/proteins
  • The cut command pulls out only column three from the resulting output, which contains the identifiers of the recognized entities
  • The grep command find only the identifiers that start with “ENSP”, which is the proteins
  • The third curl command submits this list of protein identifiers to the STRING database to retrieve a protein interaction network of them in PSI-MI-TAB format
  • Finally, we put that network into a file called string_network.tsv on our server.

In other words, with a single pipe of commands that interacts with three different servers we manage to retrieve a full-text article, perform named entity recognition of human proteins and obtain protein interactions among them. Note that whereas this is possible, it will often be desirable to store some of the intermediate results in files instead of using pipes.

By slightly modifying the command, it is possible to instead retrieve this as an image:

curl 'http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0132736.XML' | curl --data-urlencode 'document@-' --data-urlencode 'entity_types=9606' --data-urlencode 'format=tsv' 'http://tagger.jensenlab.org/GetEntities' | cut -f3 | sort -u | grep '^ENSP' | curl --data-urlencode 'identifiers@-' --data-urlencode 'limit=0' --data-urlencode 'network_flavor=confidence' 'http://string-db.org/api/image/networkList' > string_network.png

STRING network

Announcement: EMBO practical course on protein interaction analysis in South Africa

I very much look forward to once again be part of the team of teachers behind the EMBO practical course “Computational analysis of protein-protein interactions: From sequences to networks”. This time it will for the first time take place on the African continent, more specifically in Cape Town, South Africa. The course will take place from September 23 – October 3 and the application deadline is July 23.

Please check the course website or the poster below for details.

Course poster

Job: Ph.D. stipend in systems biology and bioinformatics

I currently have an open position for a Ph.D. student in my group Cellular Network Biology. My group is part part of the Novo Nordisk Foundation Center for Protein Research (CPR) and is financially supported by the Faculty of Health and Medical Sciences, University of Copenhagen.

The project will primarily focus on developing new, improved methodologies for analysis of large-scale datasets, e.g. from mass spectrometry, in the context of protein interaction networks, protein localization, and expression. In doing so, the aim is both to test scientific hypotheses and to improve existing resources developed within the group, such as STRING and COMPARTMENTS. Candidates are thus expected to have experience with programming and statistics.

The closing date for applications is June 30, 2014. For further details refer to the job advert.

Resource: Antibodypedia bulk download file and STRING payload

Antibodypedia is a very useful resource for finding commercially available antibodies against human proteins developed by Antibodypedia AB and Nature Publishing Group.

The resource is made available under the Creative Commons Attribution-NonCommercial 3.0 license, which allows for reuse and redistribution of the data for non-commercial purposes. However, the data are purely available for browsing through a web interface, which greatly limits systems biology uses of the resource. I thus wrote a robot to scrape all information from the web resource and convert it into a convenient tab-delimited file, which I have made available for download under the same license. This dataset covers a total of 579,038 antibodies against 16,827 human proteins.

To be able to use the dataset in conjunction with STRING and related resources, I next mapped the proteins to STRING protein identifiers. I was able to map 92% of all proteins in Antibodypedia. Having done this, I created the necessary files for the STRING payload mechanism to be able to show the information from Antibodypedia directly within STRING.

The end result looks like this when searching for the WNT7A protein:

Antibodypedia STRING network

The halos around the proteins encode the type and number of antibodies available. Red rings imply that at least one monoclonal antibody exists whereas gray rings imply that only polyclonal antibodies exist. The darker the ring (be it red or gray), the more different antibodies are available.

They STRING payload mechanism also extends the popups with additional information, here shown for LRP6:

Antibodypedia STRING popup

The popup shows the total number of antibodies available and how many of them are monoclonal. It also provides a direct linkout to the relevant protein page on Antibodypedia.

Please, feel free to use this Antibodypedia-STRING mashup.

Announcement: Computational analysis of protein-protein interactions for bench biologists

Once again I will be one of the teachers on an EMBO Practical Course. This time we will be teaching wet-lab biologists about how to do computational analysis of protein-protein interactions. The course will take place September 2-8 at the Max Delbrück Center for Molecular Medicine in Berlin, Germany.

The course aims to help bench scientists become more effective at exploiting the wide range of commonly-used databases and bioinformatics tools that can be used to identify, understand, and predict protein interactions by analyzing their structure, sequences, and other features.

The target group for the course are experimental scientists needing to analyse interaction data in their work, and who have limited experience using bioinformatics tools and resources. The course covers analyses and tools that are applied after potential interactions have been identified. It does not cover analysis of the raw data from, for example, mass spectrometry.

To apply for the course, fill in the online application form. The registration deadline is Friday June 15th 2012. The course fee is 200 euros for academics and 1000 euros for scientists from industry.