FeSTwo, a two-step feature selection algorithm based on feature engineering and sampling for the chronological age regression problem

Comput Biol Med. 2020 Oct:125:104008. doi: 10.1016/j.compbiomed.2020.104008. Epub 2020 Sep 26.

Abstract

Accurate determination of the sample's chronological age is an important forensic problem. This regression problem may be improved by selecting appropriate methylomic features. Most of the existing feature selection algorithms, however, optimize the regression performance by considering only the original features. This study proposed four feature engineering strategies to transform the original methylomic features. The regression performance of the age regression model was improved by the resampling-based feature selection algorithm FeSTwo proposed in this study. FeSTwo outperformed the parallel algorithms used in the previous studies even with the electronic health record data. The age prediction performance of the FeSTwo-detected features was also confirmed for another independent dataset. The study results demonstrated that the proposed model, FeSTwo, led to a more than 8% reduction in root-mean-square error (RMSE) on the test dataset with only 70 features.

Keywords: Age prediction; FeSTwo; Feature engineering; Feature selection; Linear regression; Methylomic biomarker.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*