Identification of most important features based on a fuzzy ensemble technique: Evaluation on joint space narrowing progression in knee osteoarthritis patients

Int J Med Inform. 2021 Dec:156:104614. doi: 10.1016/j.ijmedinf.2021.104614. Epub 2021 Oct 11.

Abstract

Objective: Feature selection (FS) is a crucial and at the same time challenging processing step that aims to reduce the dimensionality of complex classification or regression problems. Various techniques have been proposed in the literature to address this challenge with emphasis to medical applications. However, each one of the existing FS algorithms come with its own advantages and disadvantages introducing a certain level of bias.

Materials and methods: To avoid bias and alleviate the defectiveness of single feature selection results, an ensemble FS methodology is proposed in this paper that aggregates the results of several FS algorithms (filter, wrapper and embedded ones). Fuzzy logic is employed to combine multiple feature importance scores thus leading to a more robust selection of informative features. The proposed fuzzy ensemble FS methodology was applied on the problem of knee osteoarthritis (KOA) prediction with special emphasis on the progression of joint space narrowing (JSN). The proposed FS methodology was integrated into an end-to-end machine learning pipeline and a thorough experimental evaluation was conducted using data from the Osteoarthritis Initiative (OAI) database. Several classifiers were investigated for their suitability in the task of JSN prediction and the best performing model was then post-hoc analyzed by using the SHAP method.

Results: The results showed that the proposed method presented a better and more stable performance in contrast to other competitive feature selection methods, leading to an average accuracy of 78.14% using XG Boost at 31 selected features. The post-hoc explainability highlighted the important features that contribute to the classification of patients with JSN progression.

Conclusions: The proposed fuzzy feature selection approach improves the performance of the predictive models by selecting a small optimal subset of features compared to popular feature selection methods.

Keywords: Classification problem; Feature selection; Fuzzy logic; JSN progression; KOA; Prediction models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Fuzzy Logic
  • Humans
  • Machine Learning
  • Osteoarthritis, Knee* / diagnostic imaging