Resource: Antibodypedia bulk download file and STRING payload

April 28, 2013

Antibodypedia is a very useful resource for finding commercially available antibodies against human proteins developed by Antibodypedia AB and Nature Publishing Group.

The resource is made available under the Creative Commons Attribution-NonCommercial 3.0 license, which allows for reuse and redistribution of the data for non-commercial purposes. However, the data are purely available for browsing through a web interface, which greatly limits systems biology uses of the resource. I thus wrote a robot to scrape all information from the web resource and convert it into a convenient tab-delimited file, which I have made available for download under the same license. This dataset covers a total of 579,038 antibodies against 16,827 human proteins.

To be able to use the dataset in conjunction with STRING and related resources, I next mapped the proteins to STRING protein identifiers. I was able to map 92% of all proteins in Antibodypedia. Having done this, I created the necessary files for the STRING payload mechanism to be able to show the information from Antibodypedia directly within STRING.

The end result looks like this when searching for the WNT7A protein:

Antibodypedia STRING network

The halos around the proteins encode the type and number of antibodies available. Red rings imply that at least one monoclonal antibody exists whereas gray rings imply that only polyclonal antibodies exist. The darker the ring (be it red or gray), the more different antibodies are available.

They STRING payload mechanism also extends the popups with additional information, here shown for LRP6:

Antibodypedia STRING popup

The popup shows the total number of antibodies available and how many of them are monoclonal. It also provides a direct linkout to the relevant protein page on Antibodypedia.

Please, feel free to use this Antibodypedia-STRING mashup.


Editorial: Goodbye Google Reader – a reminder why open standards matter

March 14, 2013

This morning I woke up to the announcement that Google will be powering down Google Reader, which has long been my RSS reader of choice. RSS feeds crucial to me because it is where I follow numerous science-related blogs, read automated PubMed searches, and receive tables of content from selected journals.

When I recently bought an iPad Mini, however, I discovered to my surprise that there was no Google Reader app for iPad. This made me strongly suspect that Google had no plans to continue Google Reader. It also made me discover Feedly, which turned out to be so good that I preferred reading my RSS feeds on the iPad as opposed to on my computer. I have now installed Feedly for Chrome as well as the Android applet on my phone, so I consider myself fully migrated already. I think this is a lesson that shows the importance of open standards – whereas RSS feeds are crucial to me, replacing the viewer is no big deal.


Announcement: ICSB2013 in Copenhagen

March 8, 2013

It is my great pleasure to announce that I coorganize the 14th International Conference on Systems Biology, which will take place in Copenhagen, Denmark on August 30 – September 3, 2013.

ICSB2013

The conference will feature presentations on a wide spectrum of systems biology topics from a truly spectacular lineup of international, high-profile keynote and session speakers.

Confirmed keynote speakers:
Alexander van Oudenaarden, NL
Anne-Claude Gavin, DE
Ben Neel, CA
Bernhard Palsson, DK
Chris Voigt, USA
Dana Pe’er, USA
Doug Lauffenburger, USA
Elaine Mardis, USA
Gene Myers, DE
Jennifer Lippincott-Schwartz, USA
Kim Sneppen, DK
Lars Steinmetz, DE
Levi Garraway, USA
Marc Vidal, USA
Matthias Mann, DE
Peer Bork, DE
Philippe Bastiaens, DE
Rama Ranganathan, USA
Ruedi Aebersold, CH
Stuart Kauffman, USA
Wendell Lim, USA

Confirmed session speakers:
Bernd Bodenmiller, CH
Bob Murphy, USA
Chris Newgard, USA
Eske Willerslev, DK
Felix Naef, CH
Gerard Manning, USA
Giulio Superti-Furga, AU
Greg Stephanopoulos, USA
Haja Kadarmideen, DK
Hans Westerrhoff, NL
James Faeder, USA
Janine Erler, DK
Jasmin Fisher, UK
Jens Nielsen, DK
Lukas Pelksman, CH
Morten Sommer, DK
Neal Rosen, USA
Norbert Perrimon, USA
Rune Linding, DK
Søren Brunak, DK
Thomas Sicheritz Pontén, DK
Mikkel W. Pedersen, DK
Julio Saez-Rodriguez, UK
Michael Lee, USA
Luis Serrano, ES
Nevan Krogan, USA
Seán O’Donoghue, AU
Jonatahn Karr, USA

Organizing committee
Niels-Henrik Holstein-Rathlou
Søren Brunak
Jens Christian Brings Jacobsen
Lars Juhl Jensen
Jens Christian Brasen
Rune Linding
Morten Sommer
Jørgen K Kanters
Olga Sosnovtseva

To find out more, please check out the conference web site.


Analysis: Science used to be simpler

January 5, 2013

I guess most people have a feeling that life used to be simpler in the past. The other day it occurred to me that we researchers very often talk about how advanced our methods are, although simple methods are in many cases preferable.

So this morning I resorted to my usual strategy for analyzing such things, namely counting in Medline. More specifically I calculated for each year the percentage of publication titles that contain the words “simple” and “advanced”, respectively. In the plot below, the dots show the values for each year and the lines show five-year running averages thereof (click for PDF version):

simple_or_advanced

As can be clearly seen, life as a researcher was indeed simpler in the 50s and 60s.


Analysis: When will your BMC paper be typeset?

August 21, 2012

One month ago, people from Jan Gorodkin’s group and my own group published a paper in BMC Systems Biology. This happened after a very long process during which we were very close to retracting the manuscript due to inaction by the editor and sending it elsewhere. In the end it got accepted, but even now there is only the provisional PDF available. The paper has still not been typeset.

Typesetting is one of very few things an online-only journal does to add value. Publishers often claim to add value by organizing peer review, but if you think about it, they pass the manuscript to an unpaid editor who subsequently recruits unpaid referees to review it. Careful copyediting and typesetting of the final, accepted manuscript is thus in my view the only hands-on work that most journals do for their considerable article-processing charge. Neil Saunders’ recent blog post “We really don’t care what statistical method you used” illustrates well the care with which copy editing is done. We are thus down to only one service actually done by the publishers: typesetting the manuscript to produce XML, HTML, and PDF versions of it.

You would thus hope that typesetting at least happens promptly once a manuscript is accepted and the authors have paid. However, I have been frustrated to find that both my own manuscript in BMC Systems Biology and many manuscripts that I have downloaded from BMC journals exist only as provisional PDFs even months after publication. I thus decided to quantify to which extent typesetting of papers is delayed. To this end, I considered all papers published in each journal during the months May-July this year and calculated which percentage of them had been typeset by now.

Starting with BMC Systems Biology, here are the numbers: 7 of 26 papers from May, 3 of 24 papers from June, and 1 of 15 papers from July have been typeset to date. The numbers for BMC Bioinformatics turned out to be as disappointing: 6 of 52, 7 of 36 and 1 of 32 papers from May, June, and July have been typeset so far. And BMC Genomics confirmed the trend: 17 of 56, 14 of 74, and 11 of 67 are the numbers for May, June, and July. This adds up to only 16.9%, 10.6%, and 21.3% of papers from May-July having been typeset by BMC Systems Biology, BMC Bioinformatics, and BMC Genomics, respectively.

I continued to check other journals from BioMed Central, Chemistry Central, and SpringerOpen journals, which all are open access journals owned by Springer. The results were the same. The percentages of papers from May-July that had been typeset were 6.2%, 20.0%, and 9.0% for Proteome Science, Chemistry Central Journal, and Critical Ultrasound Journal, respectively.

To make a long, depressing story short, I should expect to wait for at least another three months before I see a typeset version of my paper. Can someone please remind me why we, the researchers, pay for this?

Full disclosure: I am an associate editor of PLoS Computational Biology.


Announcement: From genomes to cells and systems

June 27, 2012

Later this year Peer Bork, Jeroen Raes, Roland Krause, David Torrents, and I will be organizing the EMBO practical course “Computational biology: From genomes to cells and systems”. It will take place October 14-20 in L’Escala Girona, Catalonia.

In times when high-throughput data are the norm rather than the exception, computational skills to turn masses of data into tangible biological insights have become crucial. This course will teach advanced computational methods for analysis of high-throughput data in molecular biology, covering both inter-individual and inter-species variation in (meta-)genomes and linking it to clinical applications. The course will span protein and pathway level variation from single genomes to entire microbial communities.

To participate in this course, fill in the online application form at the latest July 31, 2012. The registration fee is 250 euros for participants from academia, and 600 euros for industry.


Analysis: Is PeerJ cheaper than other Open Access journals?

June 26, 2012

The newly announced Open Access journal PeerJ has caused quite a fuzz, not least because of their catch phrase: “If we can set a goal to sequence the human genome for $99, then why not $99 for scholarly publishing?”

This at first sounds very cheap; however, the $99 is not what you pay per accepted paper. PeerJ operates under a different scheme than traditional Open Access journals: instead of paying per publication, you pay a one-time fee that you pay to be able to publish in PeerJ for life. This sounds almost too good to be true.

There are a few catches, however. Firstly, $99 only entitles you to submit one manuscript per year to PeerJ. If you want to be able to submit two manuscripts per year or unlimited manuscripts, the price rises to $169 and $259 respectively.
Secondly, all authors on a manuscript must be paying PeerJ members at the time of submission (except if there are more than twelve authors, in which case it is enough that 12 of them are members). This suddenly makes the comparison to other Open Access journals much more complex, as the actual average price per manuscript depends on the number of authors, the number of other PeerJ manuscripts submitted by the same authors in their lifetime, and the acceptance rate of PeerJ. In this post I try to do the math and compare PeerJ to traditional Open Access journals, where you pay per accepted publication.

PeerJ compares itself to PLoS ONE, so I base all comparisons on that. From 2006 when PLoS ONE was launched up to and including 2011, a total of 29,042 publications have appeared with a total of 150,020 authorships. This amounts to an average of 5.1 authors per publication. When PeerJ is initially launched, no authors will have the benefit of already being members, so at first this implies that all authors will have to pay an average cost of $99*5.1 = $511 per submitted manuscript (ignoring the discount on manuscripts with 12+ authors). According to the PeerJ FAQ, this is expected to be approximately 70%. Assuming that this holds true, the average cost incurred by the authors per accepted paper will be $511/0.7 = $730. This is already considerably less than PLoS ONE, which has a publication fee of $1350 per accepted paper. From a pure cost point-of-view, PeerJ thus looks to be about half the price of PLoS ONE.

I do have some concerns related to the model of charging per author. First, I find it to be illogical, since the actual costs related to handing a manuscript are independent of the number of authors. Second, the average number of authors per paper varies between research fields, which implies that the average fee per manuscript will in some fields be higher than $730. For a manuscript with 12 authors, neither of whom are already PeerJ members, the fee per accepted manuscript is $99*12/0.7 = $1697, which is more expensive than PLoS ONE. Third, the new model gives a direct financial incentive to not include authors who made minor contributions.

In summary, I think PeerJ is a refreshing new idea – I can only applaud efforts to lower the price of scientific publishing. However, although $99 for scientific publishing sounds revolutionarily cheap, PeerJ will at first only be ~2x cheaper then PLoS ONE. Also, the new payment model, which effectively boils down to a per-author charge, is in my opinion not without its own problems.

Full disclosure: I am an associate editor of PLoS Computational Biology.


Announcement: Computational analysis of protein-protein interactions for bench biologists

June 5, 2012

Once again I will be one of the teachers on an EMBO Practical Course. This time we will be teaching wet-lab biologists about how to do computational analysis of protein-protein interactions. The course will take place September 2-8 at the Max Delbrück Center for Molecular Medicine in Berlin, Germany.

The course aims to help bench scientists become more effective at exploiting the wide range of commonly-used databases and bioinformatics tools that can be used to identify, understand, and predict protein interactions by analyzing their structure, sequences, and other features.

The target group for the course are experimental scientists needing to analyse interaction data in their work, and who have limited experience using bioinformatics tools and resources. The course covers analyses and tools that are applied after potential interactions have been identified. It does not cover analysis of the raw data from, for example, mass spectrometry.

To apply for the course, fill in the online application form. The registration deadline is Friday June 15th 2012. The course fee is 200 euros for academics and 1000 euros for scientists from industry.


Editorial: Open KPGP – when open means closed

April 9, 2012

As a computational biologist, I can only be excited about the Personal Genome Project (PGP). What is especially exciting about this particular project is that they release all data under the Creative Commons Zero public domain dediction, which gives everyone complete freedom to use the data as they wish.

The bad news is that there is still only sequence data available for 16 individuals. I was thus thrilled to see the announcement late last year that the Open Korean Personal Genome Project (KPGP) had released data on another 32 individuals. I was a bit mystified why the data were not available for download from the main PGP web site, though.

When I went to the KPGP web site, which has the very promising URL opengenome.net, I was greeted with this message:

However, the moment I tried to download any data, I was faced with the following long legal agreement, which I had to agree to to get any further:

1) Data Type
Every derived genetic information should be approved by relevant facility board.
1. Genetic information data- Individual’s sequenced DNA and analyzed data.
2. Clinical information data- Clinical information does not include family tree, Phenotype and family medical history.
2) The Commission Process
1. Bioethics committee of Genome Research Foundation will make a decision through policy reviews and case consultation.
2. The Standards Commission
This commission should be controlled by Korea National Institute for Bioethics Policy. Research, associate with potential social risks, eugenical problem and discrimination on the basis of genetic information when it comes to any aspects of physical looking, should be forbidden.
3. The Commission Process
Research project and IRB document, approved in each countries, will be required. If there is no provision for IRB approval, User must agree with additional consent documents that embodies the purpose of the data.
4. Evaluation (It will take at least one week)
3) Policy Agrement
Informed consent shall be documented by the use of a written consent form approved by the IRB, and signed by the subject or the subject’s legally authourized representative. And if necessary, the committee may request, require or otherwise obtain detailed investigation.
Data Release
Any additional costs to the subject that may result from participation in the research.
4) Data Source Agrement
The Genome Research Foundation should review and approve specifying the conditions under which data may be accepted, and ensuring adequate provisions to protect the privacy of subjects and maintain the confidentiality of data. To cite the data source in any publications or research based upon these data, and to provide a copy of any publications, the following citation should be included in any research reports, papers, or publications based on these data: Produced and distributed data should have references in Acknowledgement, Methods, Abstract.
5) Genetic Data Access Use Agreement
1. To use the data set solely for statistical reporting and analysis.
2. Not to share these data with, or provide copies of these data to, any other person or organization. Genetic data user will not use for commercial interests or potential commercialization of the results bring troubling ethical aspects the suggest greater potential abuses than clinical benefits.
3. To make no attempt to link this data set with individually identifiable records from any source, or in any other way attempt to identify the persons in this or other datasets.
4. Personal data will neither be disclosed to any exterior third parties nor be used for any other purposes.
5. That if the identity of any person or establishment in this data set is inadvertently discovered, then (a) no use will be made of this knowledge, (b) the Director of Genome Research Foundation will be advised of this incident immediately (c) the information that would identify any individual or establishment will be safeguarded or destroyed, as requested by Genome Research Foundation, and (d) no one else will be informed of the discovered identity.
6. To return or destroy the data set, and any derivative data files, upon request from Genome Research Foundation.
7. This agrement is contingent upon the approved Genome Research Foundation, and is subject to all the requirements of that agreement.

For those who cannot or do not want to read (poorly formatted and phrased) legalese, this is the polar opposite of open. It explicitly forbids redistribution, commercial use, and deidentification of individuals. It even goes as far as requiring that if I use the data in a publication, I must cite KPGP in the abstract. It is in other words closed.

To add insult to injury, I subsequently filled in a form to request access and waited for weeks for someone to grant me an account, only to discover that I cannot download the data even when logged in with said account. Instead the web site requests me to go through the same approval procedure again.


Announcement: PTMs In Cell Signaling

March 28, 2012

It is my great pleasure to announce the 2nd Copenhagen Bioscience conference “PTMs In Cell Signaling”, which will take place in Helsingør, Denmark on December 3-5, 2012.

The conference will feature a truly excellent lineup of speakers: Philippe Bastiaens, Søren Brunak, Ivan Dikic, Gerald Hart, Tim Hunt, Steve Jackson, Doug Lauffenburger, Jiri Lukas, Matthias Mann, Andre Nussenzweig, Brenda Schulman, Henrik Semb, Eric Verdin, Forest White, Michael Yaffe, and Juleen Zierath.

The conferences is limited to 220 participants. It is fully sponsored by the Novo Nordisk Foundation who covers the conference fee, hotel, transport and meals during the conference. Participants cover their own travel expenses.

To find out more, please check the conference web site.


Follow

Get every new post delivered to your Inbox.

Join 908 other followers