As a computational biologist, I can only be excited about the Personal Genome Project (PGP). What is especially exciting about this particular project is that they release all data under the Creative Commons Zero public domain dediction, which gives everyone complete freedom to use the data as they wish.
The bad news is that there is still only sequence data available for 16 individuals. I was thus thrilled to see the announcement late last year that the Open Korean Personal Genome Project (KPGP) had released data on another 32 individuals. I was a bit mystified why the data were not available for download from the main PGP web site, though.
When I went to the KPGP web site, which has the very promising URL opengenome.net, I was greeted with this message:
However, the moment I tried to download any data, I was faced with the following long legal agreement, which I had to agree to to get any further:
1) Data Type
Every derived genetic information should be approved by relevant facility board.
1. Genetic information data- Individual’s sequenced DNA and analyzed data.
2. Clinical information data- Clinical information does not include family tree, Phenotype and family medical history.
2) The Commission Process
1. Bioethics committee of Genome Research Foundation will make a decision through policy reviews and case consultation.
2. The Standards Commission
This commission should be controlled by Korea National Institute for Bioethics Policy. Research, associate with potential social risks, eugenical problem and discrimination on the basis of genetic information when it comes to any aspects of physical looking, should be forbidden.
3. The Commission Process
Research project and IRB document, approved in each countries, will be required. If there is no provision for IRB approval, User must agree with additional consent documents that embodies the purpose of the data.
4. Evaluation (It will take at least one week)
3) Policy Agrement
Informed consent shall be documented by the use of a written consent form approved by the IRB, and signed by the subject or the subject’s legally authourized representative. And if necessary, the committee may request, require or otherwise obtain detailed investigation.
Any additional costs to the subject that may result from participation in the research.
4) Data Source Agrement
The Genome Research Foundation should review and approve specifying the conditions under which data may be accepted, and ensuring adequate provisions to protect the privacy of subjects and maintain the confidentiality of data. To cite the data source in any publications or research based upon these data, and to provide a copy of any publications, the following citation should be included in any research reports, papers, or publications based on these data: Produced and distributed data should have references in Acknowledgement, Methods, Abstract.
5) Genetic Data Access Use Agreement
1. To use the data set solely for statistical reporting and analysis.
2. Not to share these data with, or provide copies of these data to, any other person or organization. Genetic data user will not use for commercial interests or potential commercialization of the results bring troubling ethical aspects the suggest greater potential abuses than clinical benefits.
3. To make no attempt to link this data set with individually identifiable records from any source, or in any other way attempt to identify the persons in this or other datasets.
4. Personal data will neither be disclosed to any exterior third parties nor be used for any other purposes.
5. That if the identity of any person or establishment in this data set is inadvertently discovered, then (a) no use will be made of this knowledge, (b) the Director of Genome Research Foundation will be advised of this incident immediately (c) the information that would identify any individual or establishment will be safeguarded or destroyed, as requested by Genome Research Foundation, and (d) no one else will be informed of the discovered identity.
6. To return or destroy the data set, and any derivative data files, upon request from Genome Research Foundation.
7. This agrement is contingent upon the approved Genome Research Foundation, and is subject to all the requirements of that agreement.
For those who cannot or do not want to read (poorly formatted and phrased) legalese, this is the polar opposite of open. It explicitly forbids redistribution, commercial use, and deidentification of individuals. It even goes as far as requiring that if I use the data in a publication, I must cite KPGP in the abstract. It is in other words closed.
To add insult to injury, I subsequently filled in a form to request access and waited for weeks for someone to grant me an account, only to discover that I cannot download the data even when logged in with said account. Instead the web site requests me to go through the same approval procedure again.