Thursday, April 30, 2015

The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection

Qasim Ayub, Massimo Mezzavilla, Luca Pagani, Marc Haber, Aisha Moyuddin, Shagufta Khaliq, Syed Qasim Mehdi, Chris Tyler-Smith

(Link) pdf open access
Open access funded by Wellcome Trust


The Kalash represent an enigmatic isolated population of Indo-European speakers who have been living for centuries in the Hindu Kush mountain ranges of present-day Pakistan. Previous Y chromosome and mitochondrial DNA markers provided no support for their claimed Greek descent following Alexander III of Macedon's invasion of this region, and analysis of autosomal loci provided evidence of a strong genetic bottleneck. To understand their origins and demography further, we genotyped 23 unrelated Kalash samples on the Illumina HumanOmni2.5M-8 BeadChip and sequenced one male individual at high coverage on an Illumina HiSeq 2000. Comparison with published data from ancient hunter-gatherers and European farmers showed that the Kalash share genetic drift with the Paleolithic Siberian hunter-gatherers and might represent an extremely drifted ancient northern Eurasian population that also contributed to European and Near Eastern ancestry. Since the split from other South Asian populations, the Kalash have maintained a low long-term effective population size (2,319–2,603) and experienced no detectable gene flow from their geographic neighbors in Pakistan or from other extant Eurasian populations. The mean time of divergence between the Kalash and other populations currently residing in this region was estimated to be 11,800 (95% confidence interval = 10,600−12,600) years ago, and thus they represent present-day descendants of some of the earliest migrants into the Indian sub-continent from West Asia.

Some Figures:

Figure 2. Kalash Demographic History
(A) PSMC analysis shows a low effective population size for the Kalash.
(B) Kalash effective population size estimated from LD analysis.
(C) MSMC analysis of the time of the split between the Kalash and African genomes (YRI, LWK, and MKK) and non-African genomes from East Asia (CHB and JPT), Europe (CEU and TSI), South Asia (GIH), and America (MXL).
(D) A UPGMA (unweighted pair group method with arithmetic mean) dendrogram shows the LD-estimated time of divergence between populations. The mean time of divergence between the Kalash and other populations from the Indian sub-continent is estimated to be 11,800 years ago (dashed red line). 

Figure 3. Shared Genetic Drift with Ancient Genomes
(A) Proportion of shared genetic drift (measured as f3 statistics) between extant world-wide HGDP-CEPH populations (including the Kalash) and the ancient Siberian hunter-gatherer (MA-1). The magnitude of the computed f3 statistics is represented by the graded heat key. The proportion of genetic drift shared between the Kalash and MA-1 is comparable to that shared between MA-1 and the Yakut, Native Americans, and northern European populations.
(B) Ternary plot of shared genetic drift with three ancient genomes: MA-1 (left), La Bran˜a 1 (middle), and Oetzi, the Tyrolean Iceman (right). The high proportion of genetic drift shared between the Kalash and MA-1 is comparable to that shared between MA-1 and Native Americans. In comparison with other populations from South Asia, the Kalash also share a higher proportion of genetic drift with La Braña 1 and Oetzi.


The present study sheds light on the origins of the enigmatic Kalash population from Pakistan. We propose that the population represents an ancient genetic isolate rather than a recently split population showing extreme genetic drift, as suggested by earlier studies.(1,6)  The outlier status of these South Asians is corroborated by the fact that we found no evidence of recent admixture in the Kalash by using a variety of analyses, including TreeMix, f3, and linkage-based statistics. The fact that researchers also genotyped ten of these samples earlier by using the HGDP-CEPH panel and that these cluster with the samples genotyped in this study rules out the possibility of confounding results due to population sub-structure within the Kalash.
The ancient separation of the Kalash from a common Eurasian ancestor is supported by PSMC and MSMC analyses, which estimated that the Kalash split from East Asians (CHB and JPT as proxy) prior to splitting from Europeans and other South Asian populations. The split from Europeans (CEU and TSI) and South Asians (represented here by GIH) appears to have occurred during the Neolithic period, which is also supported by the decay of LD. LD decay showed that the Kalash were the first population to split from the other Central and South Asian cluster around 11,800 (95% CI ¼ 10,60012,600) years ago.  This estimate remained constant even after the addition of an African (YRI) population or when the Kalash were compared with different subsets of non-African populations.

The pairwise times of divergence with other Pakistani populations ranged from 8,800 years ago with the Burusho to 12,200 years ago with the Hazara. Although migration and undetected admixture in reference populations could bias our estimate of the time of divergence, using different subsets of population revealed no strong bias in the split between the Kalash and South Asians, which occurred after the split between Europeans and South Asian populations.

Since this split, the Kalash have maintained a low Ne of around 2,500 (95% CI ¼ 2,300–2,600), estimated from LD decay with no evidence of admixture. These Ne estimates are lower than those obtained from PSMC analysis because the latter method gives a single estimate of the cross-coalescence rate from the present to 24,000 years ago, whereas the linkage-based method gives us several estimates over the past 10,000 years. It is likely that PSMC analysis could not detect that the Kalash population suffered a continuous decline in effective population size. Taking into account the expected differences in Ne between autosomes and the Y chromosome, this is in agreement with the reported Ne of 237–1,124, which was estimated with observed and evolutionary mutation rates for Y chromosomal STRs. (34)

The Kalash represent a unique branch in the South Asian population tree and appear to be the earliest population to split from the ancestral Pakistani and Indian populations, indicating a complex scenario for population origins in the sub-continent rather than just the ancestral northern and southern Indian components identified previously. (35)  These Indo-European speakers were possibly the first migrants to arrive in the Indian sub-continent from northern or western Asia. This is supported by the higher level of shared genetic drift between the Kalash and the Paleolithic Siberian hunter-gatherer skeleton (MA-1) than between MA-1 and the other South Asian populations.

Whereas the Kalash have recently been reported to have European admixture, postulated to be related to Alexander’s invasion of South Asia,(6) our results show no evidence of admixture. Although several oral traditions claim that the Kalash are descendants of Alexander’s soldiers, this was not supported by Y chromosomal analysis in which the Kalash had a high proportion of Y haplogroup L3a lineages, which are characterized by having the derived allele for the PK3 Y-SNP and are not found elsewhere.(7)  They also have predominantly western Eurasian mitochondrial lineages and no genetic affiliation with East Asians.(4)

We observed that the Kalash share a substantial proportion of drift with a Paleolithic ancient Siberian huntergatherer, who has been suggested to represent a third northern Eurasian genetic ancestry component for present-day Europeans.(36,37) This is also supported by the shared drift observed between the Kalash and the Yamnaya, an ancient (2,000–1,800 BCE) Neolithic pastoralist culture that lived in the lower Volga and Don steppe lands of Russia and also shared ancestry with MA-1.(36,37)  Thus, the Kalash could be considered a genetically drifted ancient northern Eurasian population, and this shared ancient component was probably misattributed to recent admixture with western Europeans.

We also looked at how this long-term separation, isolation, and low effective population size affected the patterns of genetic variation in the Kalash. One striking example is the frequency of the derived allele for rs4988235, which has been linked to lactose tolerance.  The Kalash, like the MA-1, are fixed for the ancestral allele for this variant, whereas their neighbors in Pakistan have been observed to have moderate frequencies of the derived allele. Although this supports their long-term isolation, it is surprising in other ways because the Kalash have no reported lactose intolerance and indeed celebrate a ‘‘milk day’’ during their annual spring rituals.(38) This suggests that there might be additional derived lactase-persistence alleles in the LCT-MCM6 (MIM: 601806) region in this population.

Another example is the extremely high frequency (93%) of the stop-gain ACTN3 variant (rs1815739) associated with normal variation in human muscle strength and speed. (39) This variant was picked up as an outlier in the PBS test for selection in the Kalash. Simulations indicated that such a high frequency of the derived allele in the Kalash can only be obtained under a scenario that includes positive selection. The variant might be relevant in cardiovascular conditioning and muscle strength related to climbing up and down high mountain passes. Although ACTN3 has not been associated with adaptation to high altitude, RYR2, another gene with an intronic outlier variant (rs2992644) in PBS, has.(40)

It has been postulated that South Asia, which is now a densely occupied land, was encountered by the first populations of modern humans that ventured out of Africa more than 50,000 years ago. The exact route taken by these earliest settlers is not known, although it has been suggested that they traveled via a southern coastal route. (41,42)

The genetically isolated Kalash might be seen as descendants of the earliest migrants that took a route into Afghanistan and Pakistan and are most likely present-day genetically drifted representatives of these ancient northern Eurasians. A larger survey that includes populations from their ancestral homeland in Nuristan, Afghanistan, would provide more insights into their unique genetic structure and origins and help explain the complex history of the peopling of South Asia.

No comments: