(pre)Historical genetics still has to be historical

Credit: Albozagros

The genetics and history of Tibet are fascinating to many. To be honest the primary reason here is elevation. The Tibetan plateau has served as a fortress for populations who have adapted biologically and culturally to the extreme conditions. Naturally this means that there has been a fair amount of population genetics on Tibetans, as hypoxia is a side effect of high altitude living which dramatically impacts fitness. I have discussed papers on this topic before. And I will probably talk more about it in the future, considering rumblings at ASHG 2012.

But to understand the character of the effect of natural selection on a population it is often very important to keep in mind the phylogenetic context. By this, I mean that evolutionary processes occur over history, and those historical events shape the course of subsequent of phenomena. Concretely, to understand how the Tibetans came to be adapted to high altitudes one must understand who they are related to, and what their long term history is. There is a paper in Molecular Biology and Evolution which attempts to do just that, Genetic evidence of Paleolithic colonization and Neolithic expansion of modern humans on the Tibetan Plateau:

Tibetans live on the highest plateau in the world, their current population size is nearly 5 million, and most of them live at an altitude exceeding 3,500 meters. Therefore, the Tibetan Plateau is a remarkable area for cultural and biological studies of human population history. However, the chronological profile of the Tibetan Plateau’s colonization remains an unsolved question of human prehistory. To reconstruct the prehistoric colonization and demographic history of modern humans on the Tibetan Plateau, we systematically sampled 6,109 Tibetan individuals from 41 geographic populations across the entire region of the Tibetan Plateau and analyzed the phylogeographic patterns of both paternal (n = 2,354) and maternal (n = 6,109) lineages as well as genome-wide SNP markers (n = 50) in Tibetan populations. We found that there have been two distinct, major prehistoric migrations of modern humans into the Tibetan Plateau. The first migration was marked by ancient Tibetan genetic signatures dated to around 30,000 years ago, indicating that the initial peopling of the Tibetan Plateau by modern humans occurred during the Upper Paleolithic rather than Neolithic. We also found evidences for relatively young (only 7-10 thousand years old) shared Y chromosome and mitochondrial DNA haplotypes between Tibetans and Han Chinese, suggesting a second wave of migration during the early Neolithic. Collectively, the genetic data indicate that Tibetans have been adapted to a high altitude environment since initial colonization of the Tibetan Plateau in the early Upper Paleolithic, before the Last Glacial Maximum, followed by a rapid population expansion that coincided with the establishment of farming and yak pastoralism on the Plateau in the early Neolithic.

The two major salient points I think need emphasis are:

1) Massive sample sizes for mtDNA and lesser extent Y chromosomal linages

2) Tibetans are a compound of agriculturalists who arrived onto the plateau >10,000 years, and, hunter-gatherers who date back to the Paleolithic

Citation: Cai, Xiaoyun, et al. “Human migration through bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum revealed by Y chromosomes.” PloS one 6.8 (2011): e24282.

There are many issues with this paper that bother me. The broadest interpretation of their thesis is one I find creditable, but in the details I’m left skeptical, confused, and more curious than when I began. Also, I need to add that I talked to the people who presented a poster on this paper at ASHG 2012, though I do not know if they were the authors. They seemed nice, but, also not necessarily totally focused on the questions they were exploring, as opposed to obtaining huge sample sizes and applying standard methods to them. Speak of which, the first thing that jumps out is that their sample is skewed toward what is today Tibet proper, the autonomous province. But Tibetan people have historically lived as far as Sichuan. Only 50% of ethnic Tibetans live in the autonomous region, but well over 90% of their samples are from this area. In terms of exploring adaptation to altitude this is fine, but if you are going to do phylogeography you need better geographical coverage I would think.

But that’s only a minor aside. The bulk of the paper consists of a laundry list of Y and mtDNA haplogroups, and coalescence times. Some of the results are very persuasive to me. There are some Y lineages which exhibit a “star shaped” phylogeny, which usually connotes a recent rapid population expansion. Using other methods the authors have inferred that there was indeed an expansion of population after the introduction of agriculture >10,000 years ago. There is no great reason on prior grounds to be skeptical of this finding. Nevertheless, drilling down produces great confusions, and I am not sure that the coalescence times and phylogenies actually mean what the authors assume they mean.

For example, here is a standard sort of analysis presented in this paper:

We identified a molecular signature of recent population expansion during the early Neolithic time in both paternal (Y-chromosomal D3a-P47 and O3a3c1-M117) and maternal (M9a1a and M9a1b1) lineages (10-7 kya) (table 1). The detailed analysis of haplotype sharing and time of divergence between Tibetans and Han Chinese suggests that the Neolithic population expansion on the Plateau was likely caused by the dispersal of the earliest Neolithic Han Chinese agriculturalists originating about 10 kya in what is now northwestern China….

O3a3c1-M117 is present at frequencies of nearly ~30%, and is connected with the Chinese as you can see above. This dovetails with other recent research which imply relatively recent common ancestors between Tibetans and Chinese. This result can be reconciled to the presence of Paleolithic roots via the fact that admixed populations will give you average results between the two extremes. The problem I have is that I am skeptical that Han Chinese existed 10,000 years ago, just as I am skeptical that Greeks existed 10,000 years ago.

Citation: Cai, Xiaoyun, et al. “Human migration through bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum revealed by Y chromosomes.” PloS one 6.8 (2011): e24282.

A quick literature search yields the fact that M117 is modal in particular non-Han ethnic groups resident in southern China and northern Southeast Asia. I am not here proposing that the Hmong introduced M117 to the Tibetans. Rather, I am suggesting that we best be careful in assuming that we know the ethnic distribution of genetic haplogroups 6,500 years before there were any written records from a given region! To me the fact that there is a putative Sino-Tibetan group of languages is strongly indicative of diversification >10,000 years, not the existence of a Han ethnicity ~10,000 year ago. The historical records are clear that ~3,000 years ago the Yangzi river, now the informal dividing line between North China and South China, was the boundary of the zone where Han were demographically dominant. And even then there were clearly pockets of “barbarian” people on the North China plain itself! It simply does not stand up to the test of basic plausibility that the agricultural expansions ~10,000 years B.P. were Han as we would understand Han. The demographic and cultural dominance of the Han in Northeast Asia is a phenomenon of the last 3,000 years, perhaps 4,000 most generously (South China became Sinicized to some extent after the fall of the Latter Han Dynasty ~200 AD, and especially the Tang period ~600-950 AD).

Much of the argumentation is creaky because of these anachronistic assumptions and the casual inferences of contemporary haplogroup frequencies back toward ancient geographical demographic distributions. Ancient DNA has highlighted the danger of this in Europe, and that should update our priors as to the robustness of this sort of analysis. For example, the authors are curious as to the lack of structure of Y chromosomal lineages, combined with the fact of their deep coalescence times across Tibet. Why is this an issue? Because if these Y chromosomal lineages are Paleolithic, then the deep converges across the branches should also correspondent to geographic differences. But they don’t. To me the simplest explanation is that the last 10,000 years have seen a great deal of population movement, and sharply differentiated populations were brought together as agriculture opened up the Tibetan plateau. This presents a problem though with inferring ancient geographic connections from present distributions, since it opens up the possibility of migration, and radical genetic-demographic turnover.

Overall I would say that this paper is interesting and useful, but you should read it closely and not take the author’s inferences too much to heart. Those inferences are grounded in assumptions which may be built on false foundations.

Addendum: Also, a “gap” on a PCA plot does not necessarily mean long term isolation, as they say in the text. It might simply be a function of inadequate sampling. See above. There are many unsupported assertions such as that. But, I would like to add that the authors found a large number “exotic” haplogroups in Lhasa itself, which aligns with what we know about the cultural history of Tibet. Tibetan Buddhism actually is influenced more by extinct variants of South Asian (particularly, Bengali) Buddhism, rather than Chinese Buddhism. Though the demographic pump along the Himalayan border seems to go from the highlands to the lowlands, there were exceptions. And these exceptions tended to be found in Lhasa.

Citation: Cai, Xiaoyun, et al. “Human migration through bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum revealed by Y chromosomes.” PloS one 6.8 (2011): e24282.

Source: Discover Magazine – Gene Expression