Thursday, December 9, 2010

Migration Paths: Dienekes' Clusters Galore Scatter Plot Results

Taking a slight diversion from ADMIXTURE and the Fertile Crescent discussion, I'd like to mention Dienekes' Scatter Plot Results (link) for human populations.  Dienekes uses a method he calls Clusters Galore with MCLUST software to dimensionalize populations according to SNP cluster groupings.  It works by carving through SNP data sets to look for SNP cluster groupings.  So far, I haven't seen any ADMIXTURE-like results with linear combinations of components in this method.  I believe that's because the cluster search algorithm results in a non-linear weighting.  However, MCLUST can handle much larger data sets than ADMIXTURE and is computationally much faster.  That permits a world view approach, without having to carefully group and balance datasets.

With this approach, you can very quickly see that certain populations or demes are grouped on certain dimensions, even those that are geographically very far apart.  Some of the demes such as those in  Europe form huge scatter plot blobs.  China and neighboring countries make up another blob.  Other groupings are more interesting, like the long arms of a galaxy or strings of dandylion seeds blowing in the wind.  They clearly represent the long arms of ancient human migrations.

I sat down the other day and tried to describe some of these migration paths and give them some names.  I was able to pick out many of the familiar migratory paths that have appeared in discussions about the peopling of Siberia, Southeast Asia and the Americas.  After looking at the first three plots (the first six dimensions) I could see that already, paths described in one plot were overlapping with other plots.  The ordering of the populations may be jumbled up a bit, but you can see that certain populations are grouped in particular migratory paths.  You can also see that the migratory paths sometimes intersect.

I only looked at the first six dimensions and did not do an exhaustive job.  Still, it is appears to be a very powerful technique to pick out long range migration paths in human populations.

Here's my list:

Dimension 1 versus 2:

A:  French Basques-Chuvash-Athabask-Aleut-West Greenland-Selkup-Yukagir-East Greenland-Ket-Maya-Chukchi-Pima-Columbians-Karitiana-Surui
(North Asian Steppe:  Northern Europe-Russia-Bering Straight-Alberta Corridor-Arizona-Central America-Columbia-Amazon)

B:  Lesgins-Adygei-Kalash-Uzbeks-Hazara
(Central Asian Steppe:  Caucasus to Pakistan)

C:  Altai-Mongol-Koryak-Tuva-Nganassan-Buryat-Oroqen-Daur-Hezhen-Han Chinese-Miaozu
(Central Asian Steppe:  Altai to Siberia)

E:  Lezgins-Adygei-Kalash-Pathan-Burusho-Gujarati-Malayan
(Caucasus-Persian Gulf-India-Coastal root to Malaysia)

F: Bantu SE-Mandenka-Bantu NE-Maasai-Ethiopians-Moroccans-Mozabites-Egyptians-Bedouin-Jordanians 
(Trans African-Nile-Red Sea)

Dimension 3 versus 4:

G. Miaozu-Dai-Lahu-Cambodians-Malay_Singapore
(Southeast Asian-to Malay Peninsula)

H. Iranians-Brahui-Burusho-Pathan-Kalash-Sindhi-Gujarati
(Iran to India)

Dimension 5 versus 6:

J. Cambodians-Dai-Maya-Pima–Columbians-Surui-Karitiana
(Pacific Rim Coastal:  Southeast Asia-China-Bering Straight-Pacific Coast, intersecting A at Pima)

K. Lithuanians-Finns-Chuvash-Chukchi-L
(See A, North Asian Steppe)

L. Yakut-Selkup-Ket-Koryak-Nganassan
(See B, Central Asian Steppe)

M. Mongola-Uzbeks-Daur-Hezhen-Mongol-Oroqen-Altai-Buryat-Tuva-L
(See B, Central Asian Steppe)

