Multi-label Deep Learning for Gene Function Annotation in Cancer Pathways

Sci Rep. 2018 Jan 10;8(1):267. doi: 10.1038/s41598-017-17842-9.

Abstract

The war on cancer is progressing globally but slowly as researchers around the world continue to seek and discover more innovative and effective ways of curing this catastrophic disease. Organizing biological information, representing it, and making it accessible, or biocuration, is an important aspect of biomedical research and discovery. However, because maintaining sophisticated biocuration is highly resource dependent, it continues to lag behind the continually being generated biomedical data. Another critical aspect of cancer research, pathway analysis, has proven to be an efficient method for gaining insight into the underlying biology associated with cancer. We propose a deep-learning-based model, Stacked Denoising Autoencoder Multi-Label Learning (SdaMLL), for facilitating gene multi-function discovery and pathway completion. SdaMLL can capture intermediate representations robust to partial corruption of the input pattern and generate low-dimensional codes superior to conditional dimension reduction tools. Experimental results indicate that SdaMLL outperforms existing classical multi-label algorithms. Moreover, we found some gene functions, such as Fused in Sarcoma (FUS, which may be part of transcriptional misregulation in cancer) and p27 (which we expect will become a member viral carcinogenesis), that can be used to complete the related pathways. We provide a visual tool ( https://www.keaml.cn/gpvisual ) to view the new gene functions in cancer pathways.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Genetic
  • Genetic Association Studies* / methods
  • Genetic Predisposition to Disease*
  • Humans
  • Machine Learning*
  • Molecular Sequence Annotation*
  • Neoplasms / genetics*
  • Neoplasms / metabolism
  • Neoplasms / pathology