Robust biomarker discovery for microbiome-wide association studies

Methods. 2020 Feb 15:173:44-51. doi: 10.1016/j.ymeth.2019.06.012. Epub 2019 Jun 22.

Abstract

According to the advances of high-throughput sequencing technology, massive microbiome data accumulated from environmental investigations to human studies. The microbiome-wide association studies are to study the relationship between the microbiome and human health or environment. Recently, Deep Neural Networks (DNNs) are encouraging due to their layer-wise learning ability for representation learning. However, DNNs are considered as black boxes and they require a large amount of training data which makes them impractical to conduct microbiome-wide association studies directly. Meanwhile, the microbiome data is high dimension with many features and noise. A single feature selection method for dealing with the kind of dataset is often unstable. In this work, we introduced a deep learning model named Deep Forest to conduct the microbiome-wide association studies and an ensemble feature selection method is proposed to guide microbial biomarkers' identification. The experiments showed that our ensemble feature method based on Deep Forest had good stability and robustness. The results of feature selection could guide the discovery of microbial biomarkers and help to diagnose microbial-related diseases. The code is available at https://github.com/MicroAVA/MWAS-Biomarkers.git.

Keywords: Deep Forest; Deep learning; Ensemble feature selection; Microbial biomarkers; Microbiome-wide association studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers*
  • Biomedical Research / methods*
  • Genome-Wide Association Study / methods*
  • Humans
  • Microbiota / genetics*
  • Neural Networks, Computer

Substances

  • Biomarkers