Sunday, March 4, 2018

Big Questions and Bigger Data: Solutions to the problem of data integration for addressing major questions in human evolution

Denne Reed
AAPA 2018, Meeting Program Abstracts
April 11-14, 2018


The future of paleoanthropology lies with our ability to address big-picture questions by integrating heterogeneous data from disparate sources. For example, simply cataloging early hominin fossils from Africa and their distribution in space and time is difficult because the necessary information is spread across numerous institutions.

This paper demonstrates data integration using PaleoCore (, an open-source, geospatial data management infrastructure. Hominin fossil specimen data, taxonomic designations, locations, geological contexts, dates, anatomical descriptions, measurements, images, and bibliographic references were aggregated from publicly available online resources. These data were cleaned, and aligned to the PaleoCore data standard and conceptual data model to produce a comprehensive digital catalog of over 2700 hominin fossils recovered from over 20 African sites in the time span between the Late Miocene to the start of the Pleistocene (Messinian to Gelasian stages, ca 7.25 - 1.8 Ma). This database was used to calculate and visualize the temporo-spatial distribution of hominin fossils during this time span.

The database marks the first phase of a broader initiative to document the entire hominin fossil record, and to link hominin fossils to a wider host of archaeological, geological, climatic and ecological data using linked open data protocols and facilitated by machine learning algorithms. This digital infrastructure provides the foundation for the collaborative efforts of research consortia now coming together to address broad questions in human evolution and to fulfill the vision of developing comprehensive evolutionary explanations for the patterns we observe in the paleoanthropological record.

No comments: