Sunday, March 4, 2012

Why "Linear" Population Model

I thought it would be timely to revisit the raison d'etre for the word "linear" in the naming of this blog.

The observation of "linear" began with my post processing of ADMIXTURE results of whole genome datasets for Middle Eastern and European populations.  That effort began in November and December of 2010.  (here, for example.)  What became apparent to me was that the ADMIXTURE data contained "components", "West Asian", "Southern European" and "Southwest Asian" that demonstrated an approximately linear genetic variation over geographic distance. 

It happens that in the past I have looked at radio frequency data that is also analyzed using methods of linear approximation.

In looking for an explanation of linearity in the population data, I found the paper "The Coalescent in a Continuous, Finite, Linear Population" by J. Wilkins and J. Wakeley. (link)

Later, I stumbled on the Marcus W. Feldman 2008 Nobel lecture, in which he points out that human migration under the influence of serial founder effect yields an approximately linear genetic variation.

That's the reason for the word "linear" in the title.  Serial founder effect, shaped by geography and climate, is the dominant effect in understanding human genetic variation. 

More recently, I haven't spent much time analyzing ADMIXTURE data sets on this blog.  I've certainly seen some very tempting whole genome data sets, but I have to confess that this is not my day job, so regretfully, there is not as much linear analysis on this blog as I would like.

In the last few months, I've looked at Neanderthals, as their small contribution to Eurasians (<4%) demonstrates that while serial founder dominates, other minor "discontinuous" contributions to the human genome also warrant consideration.

