Machine Learning Models for Prediction of Sex Based on Lumbar Vertebral Morphometry

Diagnostics (Basel). 2023 Dec 8;13(24):3630. doi: 10.3390/diagnostics13243630.

Abstract

Background: Identifying skeletal remains has been and will remain a challenge for forensic experts and forensic anthropologists, especially in disasters with multiple victims or skeletal remains in an advanced stage of decomposition. This study examined the performance of two machine learning (ML) algorithms in predicting the person's sex based only on the morphometry of L1-L5 lumbar vertebrae collected recently from Romanian individuals. The purpose of the present study was to assess whether by using the machine learning (ML) techniques one can obtain a reliable prediction of sex in forensic identification based only on the parameters obtained from the metric analysis of the lumbar spine.

Method: This paper built and tuned predictive models with two of the most popular techniques for classification, RF (random forest) and XGB (xgboost). Both series of models used cross-validation and a grid search to find the best combination of hyper-parameters. The best models were selected based on the ROC_AUC (area under curve) metric.

Results: The L1-L5 lumbar vertebrae exhibit sexual dimorphism and can be used as predictors in sex prediction. Out of the eight significant predictors for sex, six were found to be particularly important for the RF model, while only three were determined to be important by the XGB model.

Conclusions: Even if the data set was small (149 observations), both RF and XGB techniques reliably predicted a person's sex based only on the L1-L5 measurements. This can prove valuable, especially when only skeletal remains are available. With minor adjustments, the presented ML setup can be transformed into an interactive web service, freely accessible to forensic anthropologists, in which, after entering the L1-L5 measurements of a body/cadaver, they can predict the person's sex.

Keywords: forensic identification; lumbar vertebral column; machine learning; sex identification.