Joint triplet loss with semi-hard constraint for data augmentation and disease prediction using gene expression data

Sci Rep. 2023 Oct 24;13(1):18178. doi: 10.1038/s41598-023-45467-8.

Abstract

The accurate prediction of patients with complex diseases, such as Alzheimer's disease (AD), as well as disease stages, including early- and late-stage cancer, is challenging owing to substantial variability among patients and limited availability of clinical data. Deep metric learning has emerged as a promising approach for addressing these challenges by improving data representation. In this study, we propose a joint triplet loss model with a semi-hard constraint (JTSC) to represent data in a small number of samples. JTSC strictly selects semi-hard samples by switching anchors and positive samples during the learning process in triplet embedding and combines a triplet loss function with an angular loss function. Our results indicate that JTSC significantly improves the number of appropriately represented samples during training when applied to the gene expression data of AD and to cancer stage prediction tasks. Furthermore, we demonstrate that using an embedding vector from JTSC as an input to the classifiers for AD and cancer stage prediction significantly improves classification performance by extracting more accurate features. In conclusion, we show that feature embedding through JTSC can aid in classification when there are a small number of samples compared to a larger number of features.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alzheimer Disease* / genetics
  • Deep Learning*
  • Gene Expression
  • Humans
  • Learning
  • Neoplasms* / genetics