Development and comparison of forensic interval age prediction models by statistical and machine learning methods based on the methylation rates of ELOVL2 in blood DNA

Forensic Sci Int Genet. 2024 Mar:69:103004. doi: 10.1016/j.fsigen.2023.103004. Epub 2023 Dec 25.

Abstract

Age estimation can be useful information for narrowing down candidates of unidentified donors in criminal investigations. Various age estimation models based on DNA methylation biomarkers have been developed for forensic usage in the past decade. However, many of these models using ordinary least squares regression cannot generate an appropriate estimation due to the deterioration in prediction accuracy caused by an increased prediction error in older age groups. In the present study, to address this problem, we developed age estimation models that set an appropriate prediction interval for all age groups by two approaches: a statistical method using quantile regression (QR) and a machine learning method using an artificial neural network (ANN). Methylation datasets (n = 1280, age 0-91 years) of the promoter for the gene encoding ELOVL fatty acid elongase 2 were used to develop the QR and ANN models. By validation using several test datasets, both models were shown to enlarge prediction intervals in accordance with aging and have a high level of correct prediction (>90 %) for older age groups. The QR and ANN models also generated a point age prediction with high accuracy. The ANN model enabled a prediction with a mean absolute error (MAE) of 5.3 years and root mean square error (RMSE) of 7.3 years for the test dataset (n = 549), which were comparable to those of the QR model (MAE = 5.6 years, RMSE = 7.8 years). Their applicability to casework was also confirmed using bloodstain samples stored for various periods of time (1-14 years), indicating the stability of the models for aged bloodstain samples. From these results, it was considered that the proposed models can provide more useful and effective age estimation in forensic settings.

Keywords: Artificial neural network; Blood; DNA methylation; ELOVL2; Forensic age estimation; Quantile regression.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Aging* / genetics
  • Child
  • Child, Preschool
  • CpG Islands
  • DNA / genetics
  • DNA Methylation*
  • Forensic Genetics / methods
  • Genetic Markers
  • Humans
  • Infant
  • Infant, Newborn
  • Machine Learning
  • Middle Aged
  • Young Adult

Substances

  • Genetic Markers
  • DNA