What the Harappa Ancestry Project has resolved

My friend Zack Ajmal has been running the Harappa Ancestry Project for several years now. This is a non-institutional complement to the genomic research which occurs in the academy. His motivation was in large part to fill in the gaps of population coverage within South Asia which one sees in the academic literature. Much of this is due to politics, as the government of India has traditionally been reluctant to allow sample collection (ergo, the HGDP data uses Pakistanis as their South Asian reference, while the HapMap collected DNA from Indian Americans in Houston). Of course this sort of project is not without its own blind spots. Zack must rely on public data sets to get a better picture of groups like tribal populations and Dalits, because they are so underrepresented in the Diaspora from which he draws many of the project participants.

Once Zack has the genotype one of the primary things he does is add it to his broader data set (which includes many public samples) and analyze it with the Admixture model-based clustering package. What Admixture does is take a specific number of populations (e.g. K = 12) and generate quantity assignments to individuals. So, for example individual A might be assigned 40% population 1 and 60% population 2 for K = 2. Individual B might be 45% population 1 and 55% population 2. These are not necessarily ‘real’ populations. Rather, the populations and their proportions are there to allow you to discern patterns of relationships across individuals.

Since Zack has put his results online, I thought it would be useful to review what patterns have emerged over the past two years, as his sample sizes for some regions are now moderately significant. Though he has K=16 populations, not all of them will concern us, because South Asians do not tend to exhibit many of the components. I will focus on seven: S Indian, Baloch, Caucasian, NE Euro, SE Asian, Siberian and NE Asian. These are not real populations, but the labels tell you which region these components are modal. So, for example, the “S Indian” component peaks in southern India. The “Baloch” in among the Baloch people of southeastern Iran and southwest Pakistan. The “NE Euro” among the eastern Baltic peoples. The last three are Asian components, running the latitude from south to north to center. They only concern the first population of interest, Bengalis.  I will combine these last three together as “Asian.”

Below is a table, mostly individuals from Zack’s results (though there are some aggregate results from public data sets). Comments below.

Ethnicity SIndian Baloch Caucasian NEEuro Asian
Bengali 53% 28% 2% 5% 8%
Bengali Baidya 45% 30% 3% 5% 12%
Bengali Baidya 45% 27% 3% 6% 12%
Bengali Brahmin 45% 35% 2% 11% 4%
Bengali Brahmin 44% 35% 5% 11% 4%
Bengali Brahmin 43% 35% 4% 10% 4%
Bengali Brahmin 42% 32% 4% 8% 6%
Bengali Brahmin 41% 33% 7% 8% 5%
Bengali Brahmin 40% 33% 4% 10% 4%
Bengali Brahmin 40% 30% 6% 10% 7%
Bengali Muslim 50% 25% 1% 5% 15%
Bengali Muslim 49% 28% 3% 4% 15%
Bengali Muslim 45% 27% 4% 4% 17%
Bengali Muslim 45% 26% 2% 2% 16%
Bengali Muslim 45% 24% 1% 3% 19%
Bengali Muslim 43% 25% 3% 2% 18%
Bengali Muslim 48% 27% 0% 5% 15%
Tamil Brahmin 48% 37% 6% 5%
Tamil Brahmin 48% 37% 3% 5%
Tamil Brahmin 48% 35% 5% 6%
Tamil Brahmin 47% 38% 6% 4%
Tamil Brahmin 47% 40% 3% 5%
Tamil Brahmin 46% 40% 3% 6%
Tamil Brahmin Iyengar 50% 35% 2% 8%
Tamil Brahmin Iyengar 47% 38% 6% 4%
Tamil Brahmin Iyengar 47% 35% 6% 6%
Tamil Brahmin Iyer 48% 38% 4% 5%
Tamil Brahmin Iyer 48% 38% 2% 5%
Tamil Brahmin Iyer 47% 37% 2% 5%
Tamil Brahmin Iyer 47% 37% 6% 8%
Tamil Brahmin Iyer 43% 35% 6% 5%
Tamil Muslim 58% 28% 3% 2%
Tamil Nadar 62% 30% 0% 0%
Tamil Nadar 59% 32% 3% 0%
Tamil Nadar 55% 30% 3% 0%
Tamil Vellalar 50% 35% 6% 1%
Tamil Vellalar 51% 32% 5% 0%
Tamil Vellalar (Sri Lankan) 60% 32% 5% 0%
Tamil Vellalar (Sri Lankan) 60% 33% 0% 0%
Tamil Vellalar (Sri Lankan) 56% 36% 0% 0%
Tamil Vishwakarma 70% 23% 0% 0%
Tamil Vishwakarma 66% 25% 4% 0%
Andhra Pradesh 60% 34% 2% 0%
Andhra Pradesh 54% 36% 2% 3%
Andhra Pradesh (Hyderabad) 56% 29% 5% 0%
Andhra Pradesh (Hyderabad) 47% 35% 8% 4%
Andhra Pradesh Gouda 61% 30% 2% 1%
Andhra Pradesh Kamma 51% 33% 7% 0%
Andhra Pradesh Kapu 62% 30% 2% 1%
Andhra Pradesh Naidu 51% 32% 4% 2%
Andhra Pradesh Reddy 57% 37% 1% 0%
Andhra Pradesh Reddy 54% 38% 3% 0%
Andhra Pradesh Reddy 51% 35% 4% 0%
Andhra Pradesh Reddy 50% 36% 2% 1%
Andhra Pradesh Telegu Brahmin 45% 33% 6% 4%
AP Brahmin (Xing, N = 25) 49% 36% 3% 6%
AP Naidu (Reich, N = 4) 61% 31% 1% 1%
Kannada Devanga 60% 31% 3% 1%
Karnataka Catholic Christian 56% 37% 3% 0%
Karnataka Lingayat 55% 34% 4% 0%
Karnataka 54% 36% 2% 0%
Karnataka Brahmin 51% 35% 3% 5%
Karnataka Iyengar 49% 36% 5% 5%
Karnataka Iyengar 48% 39% 3% 5%
Karnataka Iyengar 48% 37% 3% 7%
Karnataka Brahmin 47% 38% 4% 6%
Karnataka Konkani Brahmin 47% 37% 2% 6%
Karnataka Konkani Brahmin 46% 33% 6% 7%
Karnataka Kokani Brahmin 44% 34% 6% 5%
Kerala 47% 33% 7% 2%
Kerala Brahmin 43% 39% 4% 6%
Kerala Christian 53% 35% 4% 0%
Kerala Christian 50% 35% 8% 1%
Kerala Christian 45% 33% 7% 3%
Kerala Muslim Rawther 53% 35% 2% 1%
Kerala Muslim Rawther 51% 28% 4% 3%
Kerala Nair 48% 40% 4% 0%
Kerala Nair 47% 38% 5% 5%
Kerala Syrian Christian 50% 37% 6% 0%
Kerala Syrian Christian 50% 35% 9% 1%
Kerala Syrian Christian 46% 33% 5% 4%
Kerala Syrian Christian 44% 33% 6% 4%
Pathan (HGDP, N = 23) 23% 42% 16% 11%
Kalash (HGDP, N = 23) 22% 43% 18% 11%
Burusho (HGDP, N = 25) 23% 41% 12% 10%
Brahui (HGDP, N = 25) 12% 58% 12% 2%
Sindhi (HGDP, N = 24) 29% 46% 10% 6%
Kashmiri Pandit (Reich, N = 5) 32% 39% 12% 9%
Punjabi 43% 36% 5% 9%
Punjabi 39% 39% 9% 7%
Punjabi 34% 43% 7% 7%
Punjabi 34% 40% 12% 8%
Punjabi 33% 44% 5% 10%
Punjabi 31% 41% 14% 8%
Punjabi 29% 36% 11% 11%
Punjabi Arain (Xing, N = 25) 31% 44% 10% 7%
Punjabi Brahmin 35% 40% 8% 11%
Punjabi Brahmin 33% 41% 13% 10%
Punjabi Chamar 40% 33% 9% 6%
Punjabi Jatt 28% 39% 11% 10%
Punjabi Jatt 30% 44% 6% 14%
Punjabi Jatt 28% 42% 8% 13%
Punjabi Jatt 28% 46% 7% 13%
Punjabi Jatt 28% 40% 10% 15%
Punjabi Jatt 27% 44% 10% 13%
Punjabi Jatt 27% 35% 16% 11%
Punjabi Jatt Muslim 30% 39% 13% 8%
Punjabi Khatri 30% 42% 12% 12%
Punjabi Lahori Muslim 31% 44% 11% 8%
Punjabi Pahari Rajput 34% 43% 11% 7%
Punjabi Pakistan 28% 36% 16% 7%
Punjabi Ramgarhia 35% 43% 5% 9%
Haryana Jat 25% 33% 12% 17%
Haryana Jat 25% 33% 12% 17%
Haryana Jatt 28% 38% 5% 20%
Haryana Jatt 26% 39% 10% 17%
Rajasthan Marwari Jain 47% 34% 5% 6%
Rajasthani Agarwal 51% 37% 6% 1%
Rajasthani Brahmin 32% 38% 9% 15%
Rajasthani Marwari 48% 34% 6% 2%
Rajasthani Rajput 45% 38% 5% 9%
UP 40% 28% 10% 8%
UP Brahmin 41% 37% 7% 11%
UP Brahmin 40% 37% 7% 11%
UP Brahmin 37% 38% 2% 14%
UP Kayastha 47% 38% 5% 3%
UP Muslim 33% 33% 10% 9%
UP Muslim 28% 35% 12% 11%
UP Muslim Pathan 48% 36% 7% 4%
UP Muslim Syed 33% 31% 13% 7%
UP Syed 36% 37% 7% 8%
UP/Haryana Agarwal 52% 35% 6% 2%
UP/Haryana Jatt 28% 42% 7% 18%
UP/Madhya Pradesh 51% 27% 1% 7%
UP/Punjabi 40% 33% 7% 10%
UP/Punjabi Khatri 27% 43% 10% 11%
Bihari Baniya 47% 31% 5% 5%
Bihari Brahmin 39% 38% 5% 11%
Bihari Kayastha 53% 33% 1% 7%
Bihari Muslim 48% 28% 5% 8%
Bihari Muslim 42% 34% 9% 6%
Bihari Muslim 41% 36% 7% 8%
Bihari Muslim 42% 32% 7% 9%
Bihari Syed 42% 35% 4% 9%
Gujarati (HapMap, N = 63, Patel) 54% 42% 0% 1%
Gujarati (HapMap, N = 34, Non-Patel) 44% 39% 5% 7%

A recent paper suggested that there was a single pulse of admixture between South and East Asians in the environs of what is today Bangladesh which occurred ~500 A.D. The traditional accounts for the arrival of Brahmins to Bengal suggests a period around and after 1000 A.D. (Bengal was one of the last redoubts of institutional Buddhism in northern India, so presumably would have less need for the services of Brahmins). The results are easy to align with these two facts. All the Bengali non-Brahmins (Baidya are a non-Brahmin high caste in West Bengal) have substantial East Asian ancestry. The Bengali Brahmins have far less of this. Additionally, their “NE Euro” component is about double that of non-Brahmins. There is still room for the Bengali Brahmins being a synthetic community with some admixture (their East Asian fraction is still notably higher than elsewhere in South Asia), but the outlines of the traditional narrative seem to explain the broad outline of these results.

When you look at South Indians from the four Dravidian states there are four facts which strike me as of note:

– There is a distinct difference between Brahmins and non-Brahmins (most of the non-Brahmins Zack has in the Harappa data set are upper caste, though the public data sets have Dalits and tribal populations)

– There is very little difference between South Indian Brahmins by region and sect (e.g., Iyengar vs. Iyer are Tamil Brahmins divided by theological differences).

– South Indian Brahmins are genetically distinct from North Indian Brahmins. They seem to have about one half the proportion of the “NE Euro” component as North Indian Brahmins (e.g., compare to Bengali Brahmins).

– South Indian non-Brahmin upper castes have very little of the “NE Euro” component, which is found at low, but consistent fractions among non-Brahmins in the Gangetic plain (and at much higher fractions as one moves toward the Punjab)

I do not know about the nature of the origin of the Pancha-Dravida group of Brahmins, but they look to be endogamous, from the same source, and probably had some admixture with the local substrate early on. This would explain their uniformity and lower fraction of “NE Euro” in relation to North Indian Brahmins. The results above also suggest that the Syrian Christians derive from converts from the Nair community, or related communities. This should not be surprising.

Finally let’s move to North India, and the zone stretching between Punjab in the Northwest and Bihar in the East. Though in much of this region Brahmins have higher “NE Euro” fractions, this relationship seems to breakdown as you go northwest. The Jatt community in particular seems to have the highest in the subcontinent. There are inchoate theories for the origins of the Jatts in Central Asia. I had dismissed them, but am thinking now they need a second look. The reasoning is simple. The Jatts of the eastern Punjab have a higher fraction of “NE Euro” than populations to their northwest (Pathans, Kalash, etc.), and Brahmin groups (e.g., Pandits) in their area who are theoretically higher in caste status. This violation of these two trends implies something not easily explained by straightforward social and geographic processes. The connection between ancestry and caste status also seems to break down somewhat in the Northwest, as there is a wide variation in ancestral components.

Someone with more knowledge of South Asian ethnography should weigh in. But until then I invite readers of South Asian heritage to submit their results to Zack.

The post What the Harappa Ancestry Project has resolved appeared first on Gene Expression.

Source: Discover Magazine – Gene Expression