June 1, 2015
Bayesian statistical modelling is an increasingly widely used method in chronometric analysis in archaeology and environmental science. Like most statistical techniques, it must be used cautiously and carefully, with the users being adequately trained. Inputing the wrong prior data in a Bayesian framework can result in outputs that are wrong and misleading.
Over the last few years we have seen several cases where the authors of articles published in a range of important journals concerned with the Palaeolithic have included Bayesian models that make assumptions which cannot be defended, ignore stratigraphic (prior) information, or are just plain wrong. An article published in PNAS today (1 June 2015) is yet another example of such an application. It is clear that journal editors are simply not properly scrutinising modelling work by obtaining referees who are adequately schooled in the methods.
In the paper, Bosch et al. use the OxCal platform to create a Bayesian age model for the site of Ksar Akil in Lebanon. In that, Bosch et al. calculate an age for a human fossil named "Ethelruda" using a command in OxCal (Date) that allows you to obtain a posterior distribution function (PDF) for otherwise undated events within a model. Bosch et al. insert the command at what they assume is the appropriate place, prior to the start of the Initial Upper Palaeolithic at the site. The generated PDF(in red below) is taken as the earliest evidence for the precocious appearance here of modern humans.
Unfortunately, what they calculate is an invalid probability distribution, one that is meaningless in terms of its statistical basis.
In a situation like this, at the start of a sequence, for a PDF to be calculated the Date command requires well-defined constraints. Bosch et al. omit this, and since there is no boundary to stop the resulting distribution from skewing backward in time, what they generated as evidence for early modern human appearance, is a modelling artefact. One which, unfortunately, forms the main conclusion of their article. Had they included a boundary prior to the start of the IUP (whether dated or not) followed by the Date command, this would have allowed the model to find a proper estimate for the calculated distribution.
To illustrate this better and without getting into the nitty-gritty of the technical details, we have tried to reproduce what Bosch et al. did with a separate set of Palaeolithic data, from the site of Cavallo in Italy (because at this time we do not yet have the actual dates from Ksar Akil). In the figure shown below are the two models we generated for Cavallo. We use exactly the same dates in both cases - but come to completely different conclusions for the age of the undated phase (red probabilities). Why?