Friday, March 16, 2012

Dienekes, Representation Bias and Sampling Bias

Dienekes, the pseudonym of an unknown anthropology blog author, publishes today PCA plots of human populations against Denisovans, Neanderthals and Chimpanzees. (link)

In the first plot, humans cluster tightly, reflecting their remarkable homogeneity.

In Dienekes' second plot, he claims to "zoom in" on the little blob of humans on the first plot.  I have good eyesight and cannot quite see how the "zoomin" of the second plot is taken directly from the "blob" at the center of the first plot. 

It's true that the little cluster of humans in the first plot is somewhat multimodal.  Those who have been following developments in genetic anthropology won't be surprised that Africans and Non-Africans cluster closely, but separately.  Clearly, there are many early humans who didn't make it through events of the last 100,000 years.  Additionally, there is evidence of separate low level (<4%) admixture events for Africans and Non-Africans that may account for some of the multimodality.  The wiki page on archaic human admixture touches on the relevant references.

Still, I can't quite make out how the two graphs match.  The first graph shows that the human cluster is skewed right toward Chimpanzees. In the "zoomed in" plot, Africans seem to be skewed downward and away from Chimpanzees, Denisovans and Neandertals.  Yet, the optical illusion is enough for some commentators to think they are observing a "shift toward chimps" of some populations.  It would be easy to laugh off Dienekes' amateurish analysis as harmless.  However, his blog is widely read.  Many of his readers are curious, but not equipped with the analytical skills to see the mistakes in his experiments.  Sadly, experts in the field of genetics rarely speak out about Dienekes' biased publications.

While we're on the topic of bias, there's another area of genetic anthropology where I can see a less deliberate form of bias.  That's in the area of sampling.  More than one sixth of the world's population are African or of recent African origin (within the last five hundred years).  However, in the world of genetic anthropology, you would never know it.  Study after study show a diverse choice of Non-African samples invariably referenced against a tiny group of African samples.  Perhaps this is because the African samples are not widely available.  However, it is hard to see how we are going to be able to understand our Out-of-Africa origins when so little effort has been expended to obtain a diverse set of African samples.  We will also not learn about migrations within Africa. Additionally, if Africans are going to benefit from genetic research, more attention needs to be paid to Africans as a diverse people, not a tiny reference blob of "Yorubans".

I'm sure that as more African DNA becomes available, genetic anthropology studies will make an effort to include a wider array of African samples.  Hopefully, some of those genetic research dollars will make their way toward treating the diseases that Africans suffer from.

As to Dienekes, a more ethically based genetics blogosphere would have more researchers and the curious realize that the message he is presenting is often distorted and potentially clouded with an uninformed view of many Sub-Saharan African groups.

7 comments:

  1. In Dienekes' second plot, he claims to "zoom in" on the little blob of humans on the first plot. I have good eyesight and cannot quite see how the "zoomin" of the second plot is taken directly from the "blob" at the center of the first plot.

    The second plot is indeed a blown-up version of the first one. This is based on publicly available data and can be repeated by anyone who cares to do so. Of course, one may criticize for free without bothering to actually do the work to back up their claims.

    Yet, the optical illusion is enough for some commentators to think they are observing a "shift toward chimps" of some populations.

    There is no "optical illusion".

    Yes, different human populations fall along different positions on the chimpanzee-archaic human PCA plot. This is also evident in Eric Durand's white paper which I have basically used, the only difference between the two being that I use a San ascertainment panel (from Harvard HGDP) whereas he uses the 23andMe SNP set.


    It would be easy to laugh off Dienekes' amateurish analysis as harmless.

    My analysis is based on actual data and is repeatable. Your amateurish criticism of my analysis is based on non-quantitative impressions sprinkled with a lot of adjectives.

    However, his blog is widely read. Many of his readers are curious, but not equipped with the analytical skills to see the mistakes in his experiments. Sadly, experts in the field of genetics rarely speak out about Dienekes' biased publications.

    Claiming bias without providing evidence of bias is indeed laughable.

    The bottom plot is indeed a blown-up version of the top plot. Of course, one cannot see the structure in the top plot because all modern humans occupy a few pixels worth of space, which is exactly the reason why the second plot is presented.

    A person with good eyesight could indeed see the structure in the top plot: human populations form a /\ shape, with the apex corresponding to Eurasians, the lower-left vertex to Melanesians -who deviate towards Denisova relative to Eurasians- and the lower-right vertex to Sub-Saharan Africans -who deviate away from Neandertals relative to Eurasians. The only non-trivial visual difference between the two is the presence of a dot on the top right which corresponds to a single San outlier and which can also be seen in the blown-up version of the plot.

    ReplyDelete
  2. @Dienekes:

    I don't doubt that one African sample skews every so slightly toward the chimpanzee samples. However, in the first plot, you overweight this sample and by comparison, grossly underweight the remaining African samples.

    From the second plot, it is clear that most Africans skew away from both chimps and Neanderthals. However, here you can't see the human cluster in comparison with the Denisovans, Neanderthals and chimps. Therefore, the weighting error of the first plot is likely to persist in the mind of the viewer.

    It is not clear whether you've made this error intentionally or by omission.

    ReplyDelete
  3. However, in the first plot, you overweight this sample and by comparison, grossly underweight the remaining African samples.

    Incorrect, there is no "weighting" whatsoever of samples.

    From the second plot, it is clear that most Africans skew away from both chimps and Neanderthals.

    Again, incorrect, all African groups are situated on the positive side of PC1 which separates archaic humans from chimps.

    It is not clear whether you've made this error intentionally or by omission.

    There is no error. If you think there is, repeat the experiment, based on the publicly availale data, and post it, otherwise your criticism is just a lot of hot air.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. It is now the consensus view that non-Africans have a small amount of Neanderthal admixture.

    It's also the consensus view that all modern humans have, as their closest living ancestor, chimpanzees.

    Therefore, it is not surprising in your PCA graph 2 that Eurasians skew slightly toward Neanderthals, while Africans do not.

    Regarding your PCA graph 1, the San data point represents not more than four individuals, as far as I can tell, while the remaining Africans, who are composed of between thirty and fify individuals, are represented by two visible data points. Clearly, the visual impact of this is to over emphasize the San samples.

    I'm generally wary of data representations that contain this kind of sloppiness.

    ReplyDelete
  6. Regarding your PCA graph 1, the San data point represents not more than four individuals, as far as I can tell, while the remaining Africans, who are composed of between thirty and fify individuals, are represented by two visible data points. Clearly, the visual impact of this is to over emphasize the San samples.

    Congratulations on discovering that you cannot represent hundreds of data points on a data surface that is a few pixels in dimension.

    The first plot shows how modern humans form a relatively tight bundle in comparison to archaic humans/chimps.

    The second plot shows that within that relatively tight bundle, there are visible differences between human populations, and these are exactly what I said they were.

    The only "sloppy" thing here is your criticism which fails to present any evidence of the alleged "bias", despite my repeated challenges to present evidence for such bias using the publicly available data. Talk is cheap, numbers are hard, better luck next time.

    ReplyDelete
  7. @Dienekes,

    Have a look at some of the papers that you reference on your blog. With small sample sizes, published works by reputable authors explicitly state their sample sizes.

    Sampling misrepresentation is only one distortion in your above referenced PCA plots.

    You also have a gridding problem in PCA plot one.

    If you wanted to clean up the mess in this post:

    1. you could start by stating sample sizes for the referenced groups,

    2. you could notate the vectors to Denisovans, Neanderthals and chimpanzees in PCA plot 2,

    3. you could fix the gridding problem in PCA plot 1, and

    4. you could make the sampled groups clearer in PCA plot 2.

    That would be a good start, at least for this post.

    I won't get into the sampling distortions and misquotes of published works that appear elsewhere on your blog.

    ReplyDelete

Comments have temporarily been turned off. Because I currently have a heavy workload, I do not feel that I can do an acceptable job as moderator. Thanks for your understanding.

Note: Only a member of this blog may post a comment.