A novel deep mining model for effective knowledge discovery from omics data

Artif Intell Med. 2020 Apr:104:101821. doi: 10.1016/j.artmed.2020.101821. Epub 2020 Feb 24.

Abstract

Knowledge discovery from omics data has become a common goal of current approaches to personalised cancer medicine and understanding cancer genotype and phenotype. However, high-throughput biomedical datasets are characterised by high dimensionality and relatively small sample sizes with small signal-to-noise ratios. Extracting and interpreting relevant knowledge from such complex datasets therefore remains a significant challenge for the fields of machine learning and data mining. In this paper, we exploit recent advances in deep learning to mitigate against these limitations on the basis of automatically capturing enough of the meaningful abstractions latent with the available biological samples. Our deep feature learning model is proposed based on a set of non-linear sparse Auto-Encoders that are deliberately constructed in an under-complete manner to detect a small proportion of molecules that can recover a large proportion of variations underlying the data. However, since multiple projections are applied to the input signals, it is hard to interpret which phenotypes were responsible for deriving such predictions. Therefore, we also introduce a novel weight interpretation technique that helps to deconstruct the internal state of such deep learning models to reveal key determinants underlying its latent representations. The outcomes of our experiment provide strong evidence that the proposed deep mining model is able to discover robust biomarkers that are positively and negatively associated with cancers of interest. Since our deep mining model is problem-independent and data-driven, it provides further potential for this research to extend beyond its cognate disciplines.

Keywords: AI; Data mining; Deep learning; Knowledge discovery; Omics data analysis; Precision medicine; Predictive modelling.

MeSH terms

  • Data Mining
  • Humans
  • Knowledge Discovery*
  • Machine Learning
  • Neoplasms* / genetics
  • Signal-To-Noise Ratio