Comparable performance of machine learning algorithms in predicting readmission and complications following total joint arthroplasty with external validation

Hashim J F Shaikh; Mina Botros; Gabriel Ramirez; Caroline P Thirukumaran; Benjamin Ricciardi; Thomas G Myers

doi:10.1186/s42836-023-00208-0

Comparable performance of machine learning algorithms in predicting readmission and complications following total joint arthroplasty with external validation

Arthroplasty. 2023 Nov 8;5(1):58. doi: 10.1186/s42836-023-00208-0.

Authors

Hashim J F Shaikh¹, Mina Botros², Gabriel Ramirez², Caroline P Thirukumaran², Benjamin Ricciardi², Thomas G Myers²

Affiliations

¹ Department of Orthopaedics and Physical Performance, University of Rochester Medical Center, 601 Elmwood Ave, Rochester, NY, 14642, USA. hashim_shaikh@urmc.rochester.edu.
² Department of Orthopaedics and Physical Performance, University of Rochester Medical Center, 601 Elmwood Ave, Rochester, NY, 14642, USA.

Abstract

Background: The purpose of the study was to use Machine Learning (ML) to construct a risk calculator for patients who undergo Total Joint Arthroplasty (TJA) on the basis of New York State Statewide Planning and Research Cooperative System (SPARCS) data and externally validate the calculator on a single TJA center.

Methods: Seven ML algorithms, i.e., logistic regression, adaptive boosting, gradient boosting (Xg Boost), random forest (RF) classifier, support vector machine, and single and a five-layered neural network were trained on the derivation cohort. Models were trained on 68% of data, validated on 15%, tested on 15%, and externally validated on 2% of the data from a single arthroplasty center.

Results: Validation of the models showed that the RF classifier performed best in terms of 30-d mortality AUROC (Area Under the Receiver Operating Characteristic) 0.78, 30-d readmission (AUROC 0.61) and 90-d composite complications (AUROC 0.73) amongst the test set. Additionally, Xg Boost was found to be the best predicting model for 90-d readmission and 90-d composite complications (AUC 0.73). External validation demonstrated that models achieved similar AUROCs to the test set although variation occurred in top model performance for 90-d composite complications and readmissions between our test and external validation set.

Conclusion: This was the first study to investigate the use of ML to create a predictive risk calculator from state-wide data and then externally validate it with data from a single arthroplasty center. Discrimination between best performing ML models and between the test set and the external validation set are comparable.

Level of evidence: III.

Keywords: Complications; Database; External validation; Machine learning; Total joint arthroplasty.