A new feature selection method based on feature distinguishing ability and network influence

J Biomed Inform. 2022 Apr:128:104048. doi: 10.1016/j.jbi.2022.104048. Epub 2022 Mar 3.

Abstract

The occurrence and development of diseases are related to the dysfunction of biomolecules (genes, metabolites, etc.) and the changes of molecule interactions. Identifying the key molecules related to the physiological and pathological changes of organisms from omics data is of great significance for disease diagnosis, early warning and drug-target prediction, etc. A novel feature selection algorithm based on the feature individual distinguishing ability and feature influence in the biological network (FS-DANI) is proposed for defining important biomolecules (features) to discriminate different disease conditions. The feature individual distinguishing ability is evaluated based on the overlapping area of the feature effective ranges in different classes. FS-DANI measures the feature network influence based on the module importance in the correlation network and the feature centrality in the modules. The feature comprehensive weight is obtained by combining the feature individual distinguishing ability and feature influence in the network. Then crucial feature subset is determined by the sequential forward search (SFS) on the feature list sorted according to the comprehensive weights of features. FS-DANI is compared with the six efficient feature selection methods on ten public omics datasets. The ablation experiment is also conducted. Experimental results show that FS-DANI is better than the compared algorithms in accuracy, sensitivity and specificity on the whole. On analyzing the gastric cancer miRNA expression data, FS-DANI identified two miRNAs (hsa-miR-18a* and hsa-miR-381), whose AUCs for distinguishing gastric cancer samples and normal samples are 0.959 and 0.879 in the discovery set and an independent validation set, respectively. Hence, evaluating biomolecules from the molecular level and network level is helpful for identifying the potential disease biomarkers of high performance.

Keywords: Feature Individual Distinguishing Ability; Feature Network Influence; Feature Selection; Omics Data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Area Under Curve