The importance of open data in genomics (and in everything!)

Yesterday a friend of mine who happens to be of doughty German and Scandinavian upper Midwest stock messaged me on Facebook and explained that her father’s results for 23andMe had come in…and he was 43 percent Sub-Saharan African! Her mother’s results came in a few hours later, and she was 35 percent Sub-Saharan African. I went to my account, and my parents were also in the same range. Oh my, overnight I became an underrepresented minority! Obviously this was a bug. The key clause is obviously. There are people who receive results suggesting that they are 5 percent Sub-Saharan African and such. Or someone like Dan MacArthur, who has likely South Asian ancestry, but in the 1-2 percent range.

How do you know that these results are not bugs? You analyze the raw data. Those who have skills with Plink or Admixture can double check easily, as I did with Dan MacArthur. Even if you don’t have that particular skill set, just use a service like Interpretome or GEDmatch. In this way you can use a range of statistical analyses to see if they reproduce concordant signals. Replication of this sort is essential. Methods don’t give you truth, they give you results which you can use to assess the likely shape of the truth.

This is why I was so hard on Ancestry.com and its lack of raw data downloads. You can’t just trust one particular firm to give you perfect analytic accuracy, they are not as gods. Your genetic information is too important to outsource to one company in terms of interpretation. If you have the skills there is no excuse to not go DIY. If you don’t have the skills you need a diverse portfolio of opinions, inferences, and assessments.

Source: Discover Magazine – Gene Expression