Analysis: 10butnotMe

About five years ago George Church announced the Personal Genome Project (PGP). A very interesting aspect of this project is that all data are released under the Creative Commons Zero waiver. This includes not only the genetic data, but also some medical information and even the identity of each individual.

Although PGP has enrolled more than a thousand individuals, it is presently only possible to download data on ten individuals. It is obviously pointless to attempt to link genotype to phenotype based on such a small number of individuals. However, I wondered if any meaningful structure would emerge if I calculated the Hamming distances for all pairs of individuals, that is the number of SNPs by which they differ (download).

Like said so done. I downloaded all available SNP data from PGP (including array and exome sequencing data), calculated all pairwise SNP distances, and visualized the results as a heatmap along with the faces of the individuals (click for a larger version of the figure):

Number of SNP differences between PGP10 individuals

Individual #10 stands out as being genetically most dissimilar from everyone else, which is unsurprising as he is the only African American in the study. I next tried to similarly define the genetically most average individual, that is the individual that is most similar to everyone else. If one defines this as the individual with the lowest sum of differences, the answer is individual #7. However, because the origins of his grandparents are unknown, it is difficult to conclude anything interesting based on this.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s