Utility metric for unsupervised feature selection

PeerJ Comput Sci. 2021 Apr 21:7:e477. doi: 10.7717/peerj-cs.477. eCollection 2021.

Abstract

Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required for their analysis. Several algorithms for this aim exist in the literature, but despite their large applicability, they can be very inaccessible or cumbersome to use, mainly due to the need for tuning non-intuitive parameters and the high computational demands. In this work, a publicly available ready-to-use unsupervised feature selector is proposed, with comparable results to the state-of-the-art at a much lower computational cost. The suggested approach belongs to the methods known as spectral feature selectors. These methods generally consist of two stages: manifold learning and subset selection. In the first stage, the underlying structures in the high-dimensional data are extracted, while in the second stage a subset of the features is selected to replicate these structures. This paper suggests two contributions to this field, related to each of the stages involved. In the manifold learning stage, the effect of non-linearities in the data is explored, making use of a radial basis function (RBF) kernel, for which an alternative solution for the estimation of the kernel parameter is presented for cases with high-dimensional data. Additionally, the use of a backwards greedy approach based on the least-squares utility metric for the subset selection stage is proposed. The combination of these new ingredients results in the utility metric for unsupervised feature selection U2FS algorithm. The proposed U2FS algorithm succeeds in selecting the correct features in a simulation environment. In addition, the performance of the method on benchmark datasets is comparable to the state-of-the-art, while requiring less computational time. Moreover, unlike the state-of-the-art, U2FS does not require any tuning of parameters.

Keywords: Dimensionality reduction; Kernel methods; Manifold learning; Unsupervised feature selection.

Grants and funding

This work received funding from FWO project G0A4918N. This project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 802895). This research received funding from the Flemish Government (AI Research Program). This work was supported by Bijzonder Onderzoeksfonds KU Leuven (BOF): The effect of perinatal stress on the later outcome in preterm babies: C24/15/036, Prevalentie van epilepsie en slaapstoornissen in de ziekte van Alzheimer: C24/18/097. Agentschap Innoveren en Ondernemen (VLAIO) 150466: OSA+ and O\& O HBC 2016 0184 eWatch. KU Leuven Stadius acknowledges the financial support of imec, and EU H2020 MSCA-ITN-2018: INtegrating Magnetic Resonance SPectroscopy and Multimodal Imaging for Research and Education in MEDicine (INSPiRE-MED), funded by the European Commission under Grant Agreement no. 813120. EU H2020 MSCA-ITN-2018: ‘INtegrating Functional Assessment measures for Neonatal Safeguard (INFANS)’, funded by the European Commission under Grant Agreement no. 813483. EIT 19263–SeizeIT2: Discreet Personalized Epileptic Seizure Detection Device. The resources and services used in the experiments of this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation—Flanders (FWO) and the Flemish Government. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.