Recruiting a skeleton crew-Methods for simulating and augmenting paleoanthropological data using Monte Carlo based algorithms

Am J Biol Anthropol. 2023 Jul;181(3):454-473. doi: 10.1002/ajpa.24754. Epub 2023 May 17.

Abstract

Objectives: Data collection is a major hindrance in many types of analyses in human evolutionary studies. This issue is fundamental when considering the scarcity and quality of fossil data. From this perspective, many research projects are impeded by the amount of data available to perform tasks such as classification and predictive modeling.

Materials and methods: Here we present the use of Monte Carlo based methods for the simulation of paleoanthropological data. Using two datasets containing cross-sectional biomechanical information and geometric morphometric 3D landmarks, we show how synthetic, yet realistic, data can be simulated to enhance each dataset, and provide new information with which to perform complex tasks with, in particular classification. We additionally present these algorithms in the form of an R library; AugmentationMC. We also use a geometric morphometric dataset to simulate 3D models, and emphasize the power of Machine Teaching, as opposed to Machine Learning.

Results: Our results show how Monte Carlo based algorithms, such as the Markov Chain Monte Carlo, are useful for the simulation of morphometric data, providing synthetic yet highly realistic data that has been tested statistically to be equivalent to the original data. We additionally provide a critical overview of bootstrapping techniques, showing how Monte Carlo based methods perform better than bootstrapping as the data simulated is not an exact copy of the original sample.

Discussion: While synthetic datasets should never replace large and real datasets, this can be considered an important advance in how paleoanthropological data can be handled.

Keywords: 3D model simulation; Markov chain Monte Carlo; data augmentation; geometric morphometrics; machine teaching.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Cross-Sectional Studies
  • Humans
  • Markov Chains
  • Monte Carlo Method
  • Skeleton*