A new method for handling heterogeneous data in bioinformatics

Comput Biol Med. 2024 Mar:170:107937. doi: 10.1016/j.compbiomed.2024.107937. Epub 2024 Jan 6.

Abstract

Heterogeneous data, especially a mixture of numerical and categorical data, widely exist in bioinformatics. Most of works focus on defining new distance metrics rather than learning discriminative metrics for mixed data. Here, we create a new support vector heterogeneous metric learning framework for mixed data. A heterogeneous sample pair kernel is defined for mixed data and metric learning is then converted to a sample pair classification problem. The suggested approach lends itself well to effective resolution through conventional support vector machine solvers. Empirical assessments conducted on mixed data benchmarks and cancer datasets affirm the exceptional efficacy demonstrated by the proposed modeling technique.

Keywords: Cancer; Heterogeneous data; Metric learning; Support vector machine.

MeSH terms

  • Algorithms*
  • Computational Biology*
  • Support Vector Machine