Predicting osteoarthritis in adults using statistical data mining and machine learning

Ther Adv Musculoskelet Dis. 2022 Jul 16:14:1759720X221104935. doi: 10.1177/1759720X221104935. eCollection 2022.

Abstract

Background: Osteoarthritis (OA) has traditionally been considered a disease of older adults (⩾65 years old), but it may appear in younger adults. However, the risk factors for OA in younger adults need to be further evaluated.

Objectives: To develop a prediction model for identifying risk factors of OA in subjects aged 20-50 years and compare the performance of different machine learning models.

Methods: We included data from 52,512 participants of the National Health and Nutrition Examination Survey; of those, we analyzed only subjects aged 20-50 years (n = 19,133), with or without OA. The supervised machine learning model 'Deep PredictMed' based on logistic regression, deep neural network (DNN), and support vector machine was used for identifying demographic and personal characteristics that are associated with OA. Finally, we compared the performance of the different models.

Results: Being a female (p < 0.001), older age (p < 0.001), a smoker (p < 0.001), higher body mass index (p < 0.001), high blood pressure (p < 0.001), race/ethnicity (lowest risk among Mexican Americans, p = 0.01), and physical and mental limitations (p < 0.001) were associated with having OA. Best predictive performance yielded a 75% area under the receiver operating characteristic curve.

Conclusion: Sex (female), age (older), smoking (yes), body mass index (higher), blood pressure (high), race/ethnicity, and physical and mental limitations are risk factors for having OA in adults aged 20-50 years. The best predictive performance was achieved using DNN algorithms.

Keywords: arthritis; machine learning; osteoarthritis; statistical data mining.