Ashkenazi Jews are probably not descended from the Khazars

A few people have asked me about a new paper on arXiv, The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses. Since it is on arXiv you can read the preprint yourself. And, since it is a preprint it is not quite polished, so keep that in mind when evaluating it. After a fashion we are part of the polishing process. So what do I think?

First, it seems to me that the author has a sense of humor about this, and I don’t know how seriously to take some of his assertions. Consider this passage: Such an unnatural growth rate (1.7-2% annually) over half a millennia, affecting only Jews residing in Eastern Europe is commonly explained by a miracle (Atzmon et al. 2010). Unfortunately, this divine intervention explanation poses a new kind of problem – it is not science. Taken literally this seems rather bizarre. In the paper referenced the author refers to the “so-called demographic miracle of population expansion,” alluding to another scholar’s observation. It seems obvious that miracle in this context simply means an inexplicable phenomenon, not a genuine supernatural intervention. There are also plain factual problems which I assume will get cleared up in the final draft. Romania and Hungary are referred to as Slavic nations which were targets of migration by Khazars fleeing the collapse of their polity. Neither of these nations were then, or are now, Slavic. In general I have to say that the historical framework of the paper is very skeletal, verging on incoherent (at least to me).

That being said, there are positives. The authors use methods which you yourself could replicate with a public data set. When it comes to the “methods” section he seems to have it down (this is clearly a side project looking at his research focus). In particular my first instinct was to look for the keyword “IBD.” To get a real good sense of history through genetics utilizing dense marker data sets you really need to look at correlations across the genome which are indicative of relatively recent relatedness, not just PCA and model-based clustering which give you summaries of affinities between populations (e.g., ADMIXTURE). And they used many of the methods you’d want to see in concert. What more could they have done? Well, tested some explicit demographic models. But that’s computationally intensive from what I recall.

Setting aside the historical fuzziness of the paper, the major issue I have is that though the methods are totally kosher, so to speak, the data you put into them strongly shape your outcomes. Dienekes and Maju both anticipated my own key concern. The “Middle Eastern” aspect of Ashkenazi Jewish ancestry might in fact be most well represented by populations in the zone of the northern Fertile Crescent and Eastern Anatolia; rather near or overlapping with the homelands of several of the Caucasian populations used in the above study as a proxy for Khazars. Additionally, ¬†modern Palestinians (the HGDP data set) are used as a reference to the Middle Eastern ancestors of Jews. I now believe that the Arabian contribution to the ancestry of Levantine and Iraqi Muslim population which dates to after the 7th century, and differentiates Muslim Arabs from their local non-Muslim Arab* co-ethnics, is significant. Perhaps on the same order of Germanic ancestry in modern England which dates to the 6th century and later. In plainer language the Caucasian component that is being detected in this paper may simply be a indigenous Middle Eastern ancestral element which has now been somewhat displaced northward in its modal frequency due to the expansion of the Arabs, and later the admixture of some Sub-Saharan admixture among Muslim Arabs. This would explain the finding of the author that the Druze, who are an endogamous community which has roots in the mountains of Lebanon, have affinities to the Turks. From this the author posits a Druze migration southward, but I suspect a more parsimonious explanation is simply that the Druze are a relatively isolated population which is more reflective of the Near Eastern genetic substratum which has been somewhat modified by over 1,000 years of cosmopolitan Muslim polities in the lowlands. In this model the modern Turks and Kurds would also be reflective of this ancient substratum, being more insulated from Sub-Saharan admixture as well as the population movements of Arabian tribes from the peninsula in the first century or so of Islam.

One aspect of the paper which requires some clarification is the idea that the Armenians are a Caucasian people. If you look at the modern state of Armenia this is eminently reasonable. But for most of its history Armenia was a marginal Caucasian nation, with its center of gravity further south, straddling Anatolia and western Iran, and looming over the plains of Mesopotamia. The Caucasian nature of modern Armenia is to a great extent a function of the extermination of Armenians from much of eastern Anatolia in the early 20th century. In contrast, Georgia is much more fundamentally a Caucasian nation. If you kept this reality in mind I suspect that passages such as this would not be necessary: The high genetic similarity between European Jews and Armenian compared to Georgians…is particularly bewildering because Armenians and Georgians are very similar populations that share a similar genetic background…and long history of cultural relations…. I wouldn’t place too much stock in one particular result, but it becomes a lot less bewildering if you know that Armenians have been much more active players in Near Eastern history because of locus of concentration further south than Georgians (e.g., Lesser Armenia).

Mind you, I wouldn’t be totally shocked if there was a Khazar contribution to the modern European Jewish ancestry. There have been some suggestive uniparental results. But the smoking gun for me is a simple one: East Asian ancestry. The Khazars were Turkic, and as such they would have had substantial proportions of East Asian ancestry. This is evident in the modern Chuvash, who have had a thousand years to admix with surrounding Slavic populations (and have). There are reasonable explanations for the “Caucasian” ancestry of Ashkenazi Jews which do not make recourse to the Khazar hypothesis. But a Mongoloid element is almost certainly feasible only through Turks of some sort, and the coincidence of a Judaized Turkic populations on the fringes of Europe is far too coincidental. There are some suggestive results which indicate small components of Mongoloid ancestry in Ashkenazi Jews, but the proportions are low enough that they may be some artifacts. This is one area where more investigation is warranted. For example, whole-genome analyses which look at “East Asian” segments in Ashkenazi Jews, and match them to various East Asian populations. That would almost certainly answer the Khazar question, as there are relatively undiluted Turkic populations, such as the Kirghiz, that one could use as a reference.

Finally, despite the fact that I praise the author’s utilization of a wide array of contemporary statistical genetic methods, one can’t just do away with a thick and sturdy historical framework and reasonable questions derived from this superstructure. The historical models tested in this paper are moderately inscrutable to me (e.g., the “Rhineland hypothesis”). As others have noted there is a peculiar lacunae in regards to models of ethnogenesis during Roman antiquity, even though other lines of historical and genetic evidence do point in that direction. Instead, the author concocts a scenario of a mass migration after the Muslim conquest from the Middle East into Europe. To my knowledge Europe after the fall of Rome was not taking in the huddled Hebrew masses (though it was taking in some Middle Eastern Christians). But perhaps I haven’t read the proper books on this issue. In some ways to me this paper screams of the problem with taking a mass of data and using legitimate methods, and coming out with very specific results because of the way the parameters are set. In this case the parameters happen to be two contrasting models, and a neglect of other alternatives. This is unfortunately one of the primary problems with “hypothesis driven research” in the age of big data.

Overall I still commend the author for putting this up on arXiv. I hope this sort of feedback will result in some revision, and we’ll get a better handle on what’s going on here.

* Though the majority of Arab non-Muslims are Arabicized (ergo, some of them still reject Arab identity despite the usage of Arabic as their day to day language), a minority may date to Arab Christian populations which were numerous on the fringes of the Roman and Persian Empires by 600.

Source: Discover Magazine – Gene Expression