Painting the human tree of life

tisk
Tishkoff et al.

Reading Peter Bellwood’s First Farmers: The Origins of Agricultural Societies, I’m struck by how much of a difference five years has made. When Bellwood was writing the ‘orthodoxy’ of the nature of the expansion of farming into Europe leaned toward cultural diffusion. Today the paradigm is in flux, as a new generation of genomic studies using ancient DNA, wider sets of markers, and a broader sampling of populations, makes untenable solid old truths. I’m reading Bellwood’s work in part because from what I have read elsewhere it seems as if his model seems less and less ridiculous in light of the new information bubbling out of human genomics. The swell of data in this field is such that it’s hard to keep up. You never know what you’re going to wake up to in the morning.

ResearchBlogging.orgThe assertions of archaeologists and pre-historians such as Bellwood have clear implications and offer up specific predictions about the shape of the tree of human phylogenetics. Now the results are getting robust enough that the models can be tested, and alternatives refuted or accepted. But sometimes you need to take stock. Many of my posts make the assumption that you have a lot of the background information in hand, but I know that’s not always possible. With that, I’d like to bring your attention to a paper in Human Molecular Genetics, Fine-scale population structure and the era of next-generation sequencing:

Fine-scale population structure characterizes most continents and is especially pronounced in non-cosmopolitan populations. Roughly half of the world’s population remains non-cosmopolitan and even populations within cities often assort along ethnic and linguistic categories. Barriers to random mating can be ecologically extreme, such as the Sahara Desert, or cultural, such as the Indian caste system. In either case, subpopulations accumulate genetic differences if the barrier is maintained over multiple generations. Genome-wide polymorphism data, initially with only a few hundred autosomal microsatellites, have clearly established differences in allele frequency not only among continental regions, but also within continents and within countries. We review recent evidence from the analysis of genome-wide polymorphism data for genetic boundaries delineating human population structure and the main demographic and genomic processes shaping variation, and discuss the implications of population structure for the distribution and discovery of disease-causing genetic variants, in the light of the imminent availability of sequencing data for a multitude of diverse human genomes.

cousins
E(x) = 4th cousins

The paper reviews all the different ways in which human populations are related, the evolutionary forces which they’re shaped by, and, nested layers of population structure which we’re now just starting to explore. A few months ago I blogged that geneticists have found that they could differentiate population clusters on the scale of nearby villages in Europe! This is incredible, as recently as 15 years ago scientists would have struggled for a dozen markers to differentiate populations separated by continents. In the paper on Sami genomics which I covered earlier in the week I didn’t even bother to mention that the Sami apparently exhibit internal population structure. Not too surprising given the fragmented nature of low density marginal ecologies, but nevertheless a reality check on the fact that we tend to perceive such people as a homogeneous whole. The Bushmen of South Africa are among the most diverse people in the whole world. SNP-chips fine-tuned with European genetic variation are likely missing many variants peculiar to Bushmen, so our estimates are surely low bounds.

coloredIn the paper above they suggest that a next step in the exploration of human genomic variation will be on a finer scale more broadly. Baden-Baden vs. Baden-Württemberg. Patterns of similarities across chromosomal segments which indicate near-familial relations because of clear identical-by-descent DNA. For example, they note that the average Ashkenazi Jew has a genetic distance from another random Ashkenazi Jew on the order of 4th cousins. Most human variation is found within populations, but there are still thousands of markers which exhibit a great deal of inter-population variance, and serve as a distinctive record of the evolutionary history of a given group. They also suggest that generally the focus as been on broad-scale population differences which have a deep time depth, on the order of tens of thousands of years. In contrast, the origin of the Ashkenazi Jews likely goes back no earlier than 1,000 years. Over the past 1,000 years they’ve coalesced into a culturally and genetically coherent people.

But why stop with small endogamous groups? 23andMe has ancestry paintings which show you your “Asian,” “European” and “African” ancestry along the chromosomes, using the three ancestral reference populations. But in the figure above from the paper you see a Cape Coloured whose ancestry has been broken down between the two African populations who are dominant in the ancestral makeup of that ethnic group. Dodecad tells me I’m about ~15% East Asian, while 23andMe tells me I’m ~45% Asian. The difference is that 23andMe doesn’t break out the indigenous South Asian from exogenous East Asian. At some point I assume I’ll be able to get an ancestry painting which shows the two ancestral categories separately.

A major reason that they seem to think the focus will be on fine-scale ancestry is that it will smoke out recessive diseases found in cryptically endogamous groups. From this I conclude that they lean to the side of those who have asserted that “Jewish diseases” are well known and prominent in part because of the focus given to that group. South Asians would be a good target of such new focus; according to some researchers there are clear patterns of genetic endogamy in this set of populations (for which we have anthropological cause). They also note that these recent rare variants are naturally going to break down along population lines, because they haven’t had time to spread.

Speaking of lacking time to spread, the evolutionary parameter of natural selection may also exhibit a lot of between population difference. Strong “sweeps” due to positive selection tend to be partitioned across continental races. This is partly a function of time. By the time a sweep goes from one end of a continent to another, it may run into another sweep which renders its selective effective irrelevant. It has famously been found that light pigmentation in western and eastern Eurasia have different genetic architectures. In other words, the phenotype is arrived at by different genotypic means in the two groups. Additionally, even within broader groups selection because of local adaptation can differentiate demes. For example, the Tibetans and the Han, or differences in lactase persistence in Spain.

Finally, they note that there will be continued advances in understanding of admixed ancestry, as well as a special focus on exonic regions (coding). The latter should be especially interesting, as it might give us more insight into functional differences and similarities. The tentative possibilities of the 1000 Genomes, HapMap, and HGDP, are also outlined. Here is their conclusion:

New sequencing technology enables genetic studies with larger and larger sample sizes, increasing our power to detect associations between genetic variants and medically relevant traits, especially in the case of rare variants. Understanding patterns of admixture and population structure is an important part of maximizing this detection power and reducing confounding factors in genome sequencing-to-phenotype association studies (50-54). As we have seen in the analysis of dense array-genotype data, the same sequencing technology that enables sequencing/phenotype mapping will also enable us to improve our knowledge of population structure at a fine scale. This improved knowledge should be of assistance not only in identifying structure in association studies, but also in the description of human history and genetic adaptation, and in the development of personalized medicine tools.

Oh, and the bar plot is nice & crisp. I’ve reedited it a bit, but check out (left to right, K = 9 to K = 5):
barplot

Citation: Brenna M. Henn, Simon Gravel, Andres Moreno-Estrada, Suehelay Acevedo-Acevedo, & Carlos D. Bustamante (2010). Fine-scale population structure and the era of next-generation sequencing Hum. Mol. Genet. : 10.1093/hmg/ddq403

Source: Discover Magazine – Gene Expression