Friday, January 28, 2011

Mutation Rate Estimates for Y-chromosome STRs

Reading these papers on the divergence of various populations, I realize that the y-chromosome mutation rate is at the crux of dating.  Without this, we may have phylogeny, but not a good dating of events.  It seems less meaningful to know how populations are related, than to know they are related but also be able to date their divergence and relate it to climatic, geological and archaeological events.

I'm posting papers on the y-chromosome mutation rate as I read them.  (The mtDNA result is not as precise.)

Here's two more papers:

Variation of 52 new Y-STR loci in the Y Chromosome Consortium worldwide panel of 76 diverse individuals
Lim, S; Xue, Y; Parkin, E; Tyler-Smith, C

Mutation rate estimates for 110 Y-chromosome STRs combining population and father-son pair data
Burgarella, C; Navascues, M

Wednesday, January 26, 2011

The Effective Mutation Rate at Y Chromosome Short Tandem Repeats, with Application to Human Population-Divergence Time

Zhivotovsky et al

Extended Y chromosome haplotypes resolve multiple and unique lineages of the Jewish priesthood

Hammer et al


It has been known for over a decade that a majority of men who self report as members of the Jewish priesthood (Cohanim) carry a characteristic Y chromosome haplotype termed the Cohen Modal Haplotype (CMH). The CMH has since been used to trace putative Jewish ancestral origins of various populations. However, the limited number of binary and STR Y chromosome markers used previously did not provide the phylogenetic resolution needed to infer the number of independent paternal lineages that are encompassed within the Cohanim or their coalescence times. Accordingly, we have genotyped 75 binary markers and 12 Y-STRs in a sample of 215 Cohanim from diverse Jewish communities, 1,575 Jewish men from across the range of the Jewish Diaspora, and 2,099 non-Jewish men from the Near East, Europe, Central Asia, and India. While Cohanim from diverse backgrounds carry a total of 21 Y chromosome haplogroups, 5 haplogroups account for 79.5% of Cohanim Y chromosomes. The most frequent Cohanim lineage (46.1%) is marked by the recently reported P58 T->C mutation, which is prevalent in the Near East. Based on genotypes at 12 Y-STRs, we identify an extended CMH on the J-P58* background that predominates in both Ashkenazi and non-Ashkenazi Cohanim and is remarkably absent in non-Jews. The estimated divergence time of this lineage based on 17 STRs is 3,190 § 1,090 years. Notably, the second most frequent Cohanim lineage (J-M410*, 14.4%) contains an extended modal haplotype that is also limited to Ashkenazi and non-Ashkenazi Cohanim and is estimated to be 4.2 § 1.3 ky old. These results support the hypothesis of a common origin of the CMH in the Near East well before the dispersion of the Jewish people into separate communities, and indicate that the majority of contemporary Jewish priests descend from a limited number of paternal lineages.

DNA Geneology, Mutation Rates, and Some Historical Evidence Written in the Y-Chromosome

Anatole Klyosov

Part I: Basic Principles and Methods (Link)
Part II:  Walking the Map (Link)

Monday, January 24, 2011

An update on the use of ADMIXTURE for analysing Middle Eastern Populations

I thought it would be helpful to add some context to the discussion of posts on Middle East ADMIXTURE results.

The November Assyrians article is based on earlier work:  A Simple Demic Diffusion Model for Syria which proposes that Middle East ADMIXTURE results indicate a diffusion process of formerly isolated populations.

Subsequent articles establish the basis on which Middle Eastern populations can be ordered:  Syria to Assyria:  3500 years of Demic Diffusion, The Coalescent in a Continuous, Finite, Linear Population and Fertile Crescent Components do the Wilson Wakeley Model.

Eurogenes K10 Middle East ADMIXTURE results place the Assyrian and other Middle East populations in context of each other.  The ordering of these results is based on the idea that the "West Asian" represents a southward bound diffusion process, while the "Southwest Asian" component represents a northward bound diffusion process.  In Behar et al, Revisited, I place the Eurogenes K10 Middle East ADMIXTURE result in the context of the Behar et al 2010 primary component cluster plot.

Finally, Assyrians and the ADMIXTURE Southwest Asian Component Revisited  reexamines earlier work regarding gene flow of the Middle East Southwest Asian Component.  This result, attributed to the recent Chiaroni et al paper, complicates the results that can be inferred from Middle East ADMIXTURE data.  At a minimum, it implies that a diffusion process for the Southwest Asian component is multi directional. Additionally, ADMIXTURE itself has limitations and cannot be used to strictly trace phylogeny.  More accurate methods, such as y-chromosome statistical analysis, are required to establish origin and date phylogeny.

What is currently missing from published works is an attempt to correlate ADMIXTURE autosomal results with those of more precise y-chromosome results.

It should be noted that in the academic literature, there is not yet consensus as to the date of the most recent common ancestor in Jewish populations.  Nor is there a good understanding of the interrelationship and date of Jews with their other J1 and J2 cousins.  ADMIXTURE results can be used a guide, but cannot be used to establish the origin or date of most recent common ancestor.

Given the potential political nature of this data, I hope that we will proceed cautiously, considerately and with an open mind.

Further posts discussing the origin and dispersal of the J1 and J2 Haplogroups:
The Origin of Y-chromosome Haplogroup J1:  Another Lake, Other Rivers
Geographical Structure of the Y-chromosomal Genetic Landscape of the Levant:  A coastal-inland contrast
J1-M267 Y lineage marks climate-driven pre-historical human displacements
Neolithic Expansion into Coastal West India
Fertile Crescent Pre-Holocene Expansion of Haplogroup J
The Bedouin
J1 Hap Map
Early Domestication on the Taurus-Zagros Arc
Battaglia et al

Saturday, January 22, 2011

Saturday, January 8, 2011

Gazelles - Arkive Images of Life on Earth

Arkive Images of Life on Earth has a beautiful website with short clips of wild animals.  The site also includes descriptions of animal status, range and habitat.  It pushes the information technology edge in terms of expanding public knowledge about wild animals and their habitats.

There is information on the Thompson Gazelle, Grant's Gazelle, Dorcas Gazelle, the Mountain Gazelle, the Goitered Gazelle, Dama Gazelle and related other species the Nubian Ibex.

Information on many other species of gazelle, wild goat, and other wild caprines are under development on the site, although full biological and range information is not yet available.

Friday, January 7, 2011

New Topology proposed for E1b1

A new paper broadens and rearranges the E1b1 phylogeny.  It proposes an East African origin of the E1b1:

A New Topology of the Human Y Chromosome Haplogroup E1b1 (E-P2) Revealed through the Use of Newly Characterized Binary Polymorphisms
Trombetta et al

Gazelle Hunters

Byzantine era mosaic of gazelle in Caesarea, Israel (Link)

I've been deep in thought over the last day or so, wondering about HG E and the driving force behind their movement northward [blog note (11/1/2015) Based on autosomal DNA analysis of the last five years, I no longer assume that population movements are unidirectional.  This applied here as well.] 

I will also confess to recalling, from living in Ghana as a child, an almost mythical association that West Africans have for the gazelle and the antelope.

The gazelle is now extinct in Ghana, but survives in other parts of the African Sahel.

Getting back to the discussion from yesterday's post, Cruciani notes that the E HG seems to have first entered West Asia 20,000 years ago.  It is likely not a coincidence that the earliest Kebaran archaeological dates in the Levant are from approximately 21,500BC.   In previous posts, (here, here and here), I've touched upon the specific house building techniques found in these Kebaran and Natufian sites.  Another distinction of these early sites is that they abound in gazelle bones.  It's quite evident that part of their lifestyle revolved around gazelle hunting.  Many of the sites seem to have been inhabited part time as would be consistent with a nomadic gazelle hunting lifestyle.  The Kebaran dates are well before the advent of farming or pastoralism anywhere in the world, so it would have to be a combination of hunting and gathering that sustained these people.

For clarity, it's worth showing two of the earliest Kebaran archaeological date maps:

Kebaran 21,500-16,000 calBC C14 Radiocarbon Context Database Maps (Link)

Kebaran 16,000-12,500 calBC C14 Radiocarbon Context Database Maps (Link)

The Cruciani 2007 paper states that men carrying the E y-chromosome appear in West Asia beginning 20,000ya.   The distribution maps for E-M78 (Cruciani et al 2007, and El Sibai 2009) both show highest concentration along the Mediterranean coast and in parts of the Arabian peninsula.  It suggests that the current distribution of E-M78 fits with the Kebaran/Ramonian/Mushabian archaeological geography, but has been swept onto the Arabian peninsula and Mediterranean coast and away from the inland Levant.

The strong association of Kabaran sites with the gazelle is also indicative of a link with hunters of Africa origin.  The current range of gazelle today are confined to the African Sahel, and to west and southwest Asia.

Digging further back on the phylogeny of haplogroup E (Figure 1 of Semino et al, 2004), we can see the spatial association of the E HG with the African Sahel, approximating the distribution of African gazelles and antelopes:

Figure 1 (Semino et al)
Phylogeny and frequency distributions of Hg E and its main subclades (panels A-G.)  The numbering of mutations is according to the Y Chromosome Consortium (YCC) (YCC 2002, Jobling and Tyler-Smith 2003). To the left of the phylogeny, the ages (in 1,000 years) of the boxed mutations are reported with their SEs (Zhivotovsky et al. 2004).  [See the paper for the author's further comments on this figure.] 

The association of the Kebaran period with the entry of the E haplogroup into the Levant is conjectural, as is the association with gazelle and antelope hunting.  However, the confluence of the timing of archaeological remains with the distribution and dating of the E haplogroup, along with the distribution of gazelle habitat seems unlikely to be a pure coincidence. 


Origin, Diffusion and Differentiation of Y-Chromosome Haplogroups E and J:  Inferences on the Neolithization of Europe and Later Migratory Events in the Mediterranean Area
Semino et al

Tracing Past Human Male Movements in Northern/Eastern Africa and West Eurasia:  New Clues from Y-Chromosomal Haplogroup E-M78 and J-M12
Cruciani et al

Gazelle exploitation in the early Neolithic site of Motza, Israel:  the last of the gazelle hunters of the southern Levant
Sapir-Hen et al

The Natufian Culture of the Levant, Threshold to the Origin of Agriculture
Bar-Yosef, Ofer

Domestication and early agriculture in the Mediterranean Basin:  Origins, diffusion, and impact
Zeder, M

Thursday, January 6, 2011

Embarking on an Exploration of the E Haplogroup Dispersal with relation to ADMIXTURE

Egyptian men relaxing outside ahwa (coffee house) (Link)

It's been over a month since I first posted the Eurogenes ADMIXTURE results for Middle Eastern populations. (Link)

As some of you might remember, I took the Eurogenes ADMIXTURE results and normalized them on three components, the West Asian, Southwest Asian and South European components:

Populations are Georgian, Iranian, Armenian, Uzbekistan Jews, Azerbaijani Jews, Georgian Jews, Iranian Jews, Iraqi Jews, Druze, Syrians, Jordanians, Palestinians, Samaritans, Egyptians, Yemenis, Saudis, Bedouin

Working from these plots, I've been focusing on the association between haplogroups and these three ADMIXTURE components.  Up to now, I've focused primarily on haplogroups that are confined to West Asia and Southwest Asia.

What of the South European component?  With what haplogroups is it associated?
There are several clues:  the component is slightly elevated in Cypriots and Egyptians while being suppressed in Saudi Arabia and the Bedouin.  It's quite suppressed in coastal Pakistan and India.

Haplogroup E-M78 is prevalent in the Coastal Mediterranean, including Cyprus and Egypt (El-Sibai), while being absent from India (Sengupta).

Following the likelihood grouping pattern discussed earlier, ADMIXTURE probably lumps several other haplogroups into the South European component, but for West Asia and North Africa, it is E-M78 that presents the most distinctive and dominant pattern.

During the last ten years, the investigation of the E haplogroup has not been an easy one and its description in ISOGG only begins to touch upon the complexity of its distribution and the conjecture as to its origin.

As an E-M78 starting point, I reference Cruciani 2007, below.   A number of subsequent papers further define the origin and dispersal of the E Haplogroup.  Over the next few days, I'll be reviewing those papers and posting as appropriate.

Tracing Past Human Male Movements in Northern/Eastern Africa and Western Eurasia:  New Clues from Y-Chromosome Haplogroup E-M78 and J-M12
Cruciani et al


Detailed population data were obtained on the distribution of novel biallelic markers that finely dissect the human Ychromosome haplogroup E-M78. Among 6,501 Y chromosomes sampled in 81 human populations worldwide, we found 517 E-M78 chromosomes and assigned them to 10 subhaplogroups. Eleven microsatellite loci were used to further evaluate subhaplogroup internal diversification. 

The geographic and quantitative analyses of haplogroup and microsatellite diversity is strongly suggestive of a northeastern African origin of E-M78, with a corridor for bidirectional migrations between northeastern and eastern Africa (at least 2 episodes between 23.9–17.3 ky and 18.0–5.9 ky ago), trans-Mediterranean migrations directly from northern Africa to Europe (mainly in the last 13.0 ky), and flow from northeastern Africa to western Asia between 20.0 and 6.8 ky ago. 

A single clade within E-M78 (E-V13) highlights a range expansion in the Bronze Age of southeastern Europe, which is also detected by haplogroup J-M12. Phylogeography pattern of molecular radiation and coalescence estimates for both haplogroups are similar and reveal that the genetic landscape of this region is, to a large extent, the consequence of a recent population growth in situ rather than the result of a mere flow of western Asian migrants in the early Neolithic.

Our results not only provide a refinement of previous evolutionary hypotheses but also well-defined time frames for past human movements both in northern/eastern Africa and western Eurasia.

Wednesday, January 5, 2011

The Alan Migration

In the comments section of the last post, Ricardo Costa de Oliveira alerts us to the possibility of a marker for the Alan migration from the Caucasus to the Iberian peninsula.

Ricardo points out that a specific J1b modal appears in two Iranians, as sited in the recent paper:

Influences of history, geography, and religion on genetic structure:  the Marionites in Lebanon
Haber at al
(Figures and Tables)
Supplementary Tables:
    (Lebanon by region, religion and sect)
    (Iran by region)
Ricardo observes that two western Iranians, I171 and I174, bear the same model haplotype as some people from the Azores, Portugal and Brazil.  Additionally, J1 haplotypes in the Iran table seem to be very diverse, confirming the results of Chiaroni et al indicating a Southeastern Turkey-Zagros Mountain origin of the J1 Haplogroup.

Given the documented westward migration of the Alans, it would not be out of the realm of possibility that this particular J1b haplotype marks that migration.

Research on Ossetians does not yet indicate this haplotype.  It may have been lost in the centuries between the time of the Alan migrations westward and today.

In order to narrow in on which migration may have spread western Iranian haplotypes to the Iberian peninsula, it would be important to create a picture of Ossetian and western Iranian haplotypes to have reached western Europe and beyond.

Monday, January 3, 2011

Sadness at the Crossroads of Humanity

It is with sadness that I read these papers on the Kurds, Armenians, Assyrians and other Middle East regions.  You can't execute a Google search on any of these groups without seeing guns everywhere and dead children.

After writing the post on the The Origin of Y-chromosome Haplogroup J1:  Another Lake, Other Rivers, I got an email from someone in Turkey who possessed the J1* y-chromosome HG.  He was thrilled to get more information about his genetic identity.  I also took pause, realizing that his paternal ancestors must have been in that region of the world for tens of thousands of years.

Then I pondered my own paternal ancestors (mostly of the R1b variety).  While J1*'s ancestors were rooted in the mountains of Turkey, mine were constantly on the move westward.

I realized I'm value neutral as to whether someone's ancestors have remained rooted in one place or were world travellers.  I'm also value neutral as to the degree of homogeneity or heterogeneity in someone's background.  It surprises and disappoints me to see someone gleefully declaring that a genetic background is more "pure" with respect to someone else.

If anything, reading these papers, I've realized that we are all quite genetically heterogeneous.  Yes, we've been isolated by geography, religion and social norms, but overwhelmingly, we see the Mesolithic intersecting with the Neolithic, and Asia, India, Africa and Europe surprisingly interwoven.

Some of the posts I've seen recently try to carve out the minor difference between Assyrian and Kurd or Indian and Pakistani.  The methods used are often dubious.  The genetic variability within these groups is ignored.

It's sad to see genetic information used in this way.  The story of the great interconnected web of human history is lost in someone's narrow agenda.

I hope that rather than accept these agenda driven distillations, people will read reviewed papers not only about their own genetic history, but also about others, and sit with the complexity of the human journey.

Peace be with you.


Boys from a Kurdish family herd sheep in Suleymaniya, in northern Iraq.  (Link)

MtDNA and Y-chromosome Variation in Kurdish Groups
Nasidze, I; Quinque, D; Ozturk, M; Bendukidze, N; Stoneking, M

Y-chromosome and mtDNA polymorphisms in Iraq, a crossroad of the early human dispersal and of post Neolithic migrations
Al-Zahery et al


Armenian Woman in National Costume, Artvin, circa 1910
Prokudin-Gorskii, Sergei Mikhailovich, photographer (Link)
Repository:  Library of Congress Prints and Photographs Division, Washington DC (Link)

I came across this great link for the Armenian DNA Project.  It is particularly interesting to reflect on the comments of Professor Levon Yepiskoposyan of the Institute of Molecular Biology in Yerevan: "Y chromosome haplotypes diversity in the modern Armenian population reveals strong regional structure with marked separation of mountainous (Syunik region in the south of Armenia, and Karabakh) and valley (Ararat valley, northern and western regions of historical Armenia) groups."

The Armenian DNA Project notes:  "The mountain groups have a greater concentration of R1b1 while the valley groups have a greater concentration of J2 & J1 (and to a lesser extent, slightly greater concentrations of G & E1b1b1)."

Sunday, January 2, 2011

ADMIXTURE and STRUCTURE in Perspective: A Discussion of the Pitfalls of Likelihood Population Structure Analysis

In December, as ADMIXTURE results from the Dodecad Ancestry Project were put online, I noticed a post by one commenter that intimated that ADMIXTURE would soon be automatically generating phylogeny trees for various populations.

I'd been looking at the Dodecad results for a while and I hadn't observed an orderly tree like generation of results with increasing K factor, so I had my suspicions about this comment.

As I've analyzed the ADMIXTURE results for Middle Eastern populations, the Southwest Asian component seems to correlate with populations where the J1 Y-chromosome HG and R mt-DNA are common.  El-Sibai et al and Chiaroni et al note that the J1 haplogroup is isolated and strongly represented in the inland Levant and on the Arabian peninsula.  It is thus reassuring that ADMIXTURE was able to differentiate a component for this genetically isolated population.

More puzzling are the West Asian and South European clusters.  Based on spatial distribution data for West Asia, there seems to be a correlation of the J2, L and G y-chromosome haplogroups with the West Asian component and the R1b and E1b1b1 y-haplogroups with the Southern European component.  (See El-Sibai, Table 1 and Figure 2.)

The Eurogenes K10 results for Sinds and Gujaratis, which were run together with the Behar dataset, further elucidate the ADMIXTURE grouping of components with y-chromosome HGs:

ADMIXTURE Eurogenes K10

Component        Gujarat   Sinds

West Asian            0.24   0.37            
Central Asian          0.07   0.09
SW Asian                   0   0.04
South Euro             0.02   0.01
North Euro              0.02   0.02
South Asian            0.63   0.45

Consider these results against y-chromosome HG results for India and Pakistan:
Table 5 (Sengupta et al) lists the y-chromosome HG frequencies for India and Pakistan:

HG                     India(%)  Pakistan(%)

G1-M285                           0.57         
G2-P15              1.24         4.55
G5-P15                             1.14
J2a-M410           3.57         8.52
J2a1b-M067                      1.14
J2a1e-M158       0.27  
J2b2-M241         5.22         2.27
L1-M076            6.32         5.11
L2-M317                           1.14
L3-M231            0.41         6.82
                       17.15       31.26

J1-M267             0.27        3.41

R*-M207            0.27         3.41
R1*-M173                         0.57
                         0.27        3.98

R1a1-M017        15.8       24.43

R1b2b-M073                     4.55
R1b3-M269        0.55         2.84
                          0.55       7.39

R2-M124             9.34       7.39

The authors of the Thangaraj et al paper note that the Neolithic West Asian contributions to the Pakistan and Indian genetic picture are paternal and are composed primarily of the G, J2 and L y-haplogroup HGs.  That presents a unique opportunity to discern which West Asian y-haplogroups are grouped into the ADMIXTURE West Asian Component.

Comparing the proportions for Gujaratis, in Northwest India, and Sinds, in Southern Pakistan, from ADMIXTURE with the Sengupta results gives an idea of the y-haplogroup-West Asian component correlation:  Haplogroups J2, G and L appear to be grouped into the West Asian component.

Haplogroup R1a1 groups into the Central Asian component.  It is notable that Southern Pakistan Sinds appear to have a lower R1a1 contribution than other parts of Pakistan.

The Indian ADMIXTURE Southwest Asian component is 0% (Sengupta J1 HG result for India of 0.27%).    The Sind ADMIXTURE Southwest Asian result is 4% (Sengupta Pakistan: 3.41% J1 HG).

Due to their low level, it is not clear in which clusters R1* and R2 HG populations group.

Returning to the discussion about the limitations of ADMIXTURE, it is notable that ADMIXTURE has grouped the J2, G and L HGs together (but not J1).  In retrospect, since men with J2, G and L HGs have been interspersed in West Asian since the LGM, it isn't surprising that they are grouped together.  What is surprising is that J1 appears separated, even when it is in a leaf branch of the J-G-L root phylogeny.

Here, we can see that ADMIXTURE is not partitioning populations based on phylogeny, but by the degree to which a population has been isolated over a timescale of thousands of years.

A recent paper investigates some reasons why likelihood based algorithms such as ADMIXTURE sometimes fail to correctly identify phylogeny and the relationship between clusters.  The paper focuses on another computer program, STRUCTURE, but the problems of genetic clustering based on likelihood are encountered in all likelihood algorithms:

The computer program STRUCTURE does not reliably identify the main genetic clusters within a species:  simulations and implications for human population structure
ST Kalinowski

From the paper:

"The goals of this paper are twofold. First, I will use computer simulation to examine whether STRUCTURE can correctly group individuals into clusters when populations have had a history of fragmentation and isolation. This is one of the simplest types of histories that a set of populations might have, and one of the most commonly used models to describe genetic relationships among natural populations. Second, I will explore two previously published data sets of human genetic diversity to determine whether problems identified in the simulations have influenced depictions of human population structure."

"Results from the simulations showed that the clustering arrangements produced by STRUCTURE were affected by the relative amount of differentiation among the populations, and that in some circumstances, STRUCTURE produced clusters that were not consistent with the main evolutionary divisions within the populations. For example, Figure 1 shows that STRUCTURE created evolutionarily accurate clusters when populations A, B and C were closely related to each other (for example, divergence times: 100/200/800). However, when population C was less closely related to population A and B--but still more related to A and B than to D--STRUCTURE clustered individuals from population C with population D."

"The results above show that if the value of K used to run STRUCTURE is less than the actual number of populations, STRUCTURE will sometimes place individuals from unrelated populations into the same cluster."

"I suspect that the problem is that the probability of the genotypic data is maximized by placing as many individuals as possible into genetically homogeneous clusters--with little regard to how the remaining individuals are clustered."

In the case of the Fertile Crescent analysis, this points to one reason why the J2, G and L related haplogroups may be clustered into the West Asian component. Not only are these populations not isolated, they are also less genetically homogeneous than the more isolated and easily discernible J1 related Southwest Asian component.

The tendency of likelihood algorithms to favor most similar clusters while grouping other less similar clusters has implications for the ability of these algorithms to analyze populations that are related, but less related than a dominant group. For example, they may have trouble correctly examining the flow of populations between Europe and Africa relative to European and African populations:

"The genetic similarities between Europeans and some Africans that I found are not evident in the output of STRUCTURE (Rosenberg et al 2005). STRUCTURE clustered all sub-Saharan Africans into a single cluster and all Europeans into another cluster (Rosenberg et al., 2005)—which suggests that the peoples of each of these continents are genetically more similar to each other than to peoples on other continents. Previous analyses of genetic diversity in humans do not seem to have noted the genomic similarity of Europeans and present-day African farmers. It has been shown for mitochondrial DNA (Ingman et al., 2000) and for Y-chromosomes (for example, Underhill and Kivisild, 2007), but apparently has not been recognized for autosomal loci which make up the majority of human genome."

The paper also mentions the problem of sample size. This does appear to be a problem with ADMIXTURE. SNP Clusters that are heavily sampled seem to inflate the degree to which they are represented, while under represented clusters may appear, but at very low levels.