Feature Selection for Health Care Costs Prediction Using Weighted Evidential Regression

Sensors (Basel). 2020 Aug 6;20(16):4392. doi: 10.3390/s20164392.

Abstract

Although many authors have highlighted the importance of predicting people's health costs to improve healthcare budget management, most of them do not address the frequent need to know the reasons behind this prediction, i.e., knowing the factors that influence this prediction. This knowledge allows avoiding arbitrariness or people's discrimination. However, many times the black box methods (that is, those that do not allow this analysis, e.g., methods based on deep learning techniques) are more accurate than those that allow an interpretation of the results. For this reason, in this work, we intend to develop a method that can achieve similar returns as those obtained with black box methods for the problem of predicting health costs, but at the same time it allows the interpretation of the results. This interpretable regression method is based on the Dempster-Shafer theory using Evidential Regression (EVREG) and a discount function based on the contribution of each dimension. The method "learns" the optimal weights for each feature using a gradient descent technique. The method also uses the nearest k-neighbor algorithm to accelerate calculations. It is possible to select the most relevant features for predicting a patient's health care costs using this approach and the transparency of the Evidential Regression model. We can obtain a reason for a prediction with a k-NN approach. We used the Japanese health records at Tsuyama Chuo Hospital to test our method, which included medical examinations, test results, and billing information from 2013 to 2018. We compared our model to methods based on an Artificial Neural Network, Gradient Boosting, Regression Tree and Weighted k-Nearest Neighbors. Our results showed that our transparent model performed like the Artificial Neural Network and Gradient Boosting with an R2 of 0.44.

Keywords: dempster–shafer theory; evidential regression; feature selection; health care costs; interpretable prediction; regression; supervised learning.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Female
  • Health Care Costs*
  • Humans
  • Male
  • Neural Networks, Computer*