Detection and Grading of Radiographic Hand Osteoarthritis Using an Automated Machine Learning Platform

ACR Open Rheumatol. 2024 Apr 4. doi: 10.1002/acr2.11665. Online ahead of print.

Abstract

Objective: Automated machine learning (autoML) platforms allow health care professionals to play an active role in the development of machine learning (ML) algorithms according to scientific or clinical needs. The aim of this study was to develop and evaluate such a model for automated detection and grading of distal hand osteoarthritis (OA).

Methods: A total of 13,690 hand radiographs from 2,863 patients within the Swiss Cohort of Quality Management (SCQM) and an external control data set of 346 non-SCQM patients were collected and scored for distal interphalangeal OA (DIP-OA) using the modified Kellgren/Lawrence (K/L) score. Giotto (Learn to Forecast [L2F]) was used as an autoML platform for training two convolutional neural networks for DIP joint extraction and subsequent classification according to the K/L scores. A total of 48,892 DIP joints were extracted and then used to train the classification model. Heatmaps were generated independently of the platform. User experience of a web application as a provisional user interface was investigated by rheumatologists and radiologists.

Results: The sensitivity and specificity of this model for detecting DIP-OA were 79% and 86%, respectively. The accuracy for grading the correct K/L score was 75%, with a κ score of 0.76. The accuracy per DIP-OA class differed, with 86% for no OA (defined as K/L scores 0 and 1), 71% for a K/L score of 2, 46% for a K/L score of 3, and 67% for a K/L score of 4. Similar values were obtained in an independent external test set. Qualitative and quantitative user experience testing of the web application revealed a moderate to high demand for automated DIP-OA scoring among rheumatologists. Conversely, radiologists expressed a low demand, except for the use of heatmaps.

Conclusion: AutoML platforms are an opportunity to develop clinical end-to-end ML algorithms. Here, automated radiographic DIP-OA detection is both feasible and usable, whereas grading among individual K/L scores (eg, for clinical trials) remains challenging.