Host Trait Prediction from High-Resolution Microbial Features

Methods Mol Biol. 2021:2242:185-202. doi: 10.1007/978-1-0716-1099-2_12.

Abstract

Predicting host traits from metagenomes presents new challenges that can be difficult to overcome for researchers without a strong background in bioinformatics and/or statistics. Profiling bacterial communities using shotgun metagenomics often leads to the generation of a large amount of data that cannot be used directly for training a model. In this chapter we provide a detailed description of how to build a working machine learning model based on taxonomic and functional features of bacterial communities inhabiting the lungs of cystic fibrosis patients. Models are built in the R environment by using different freely available machine learning algorithms.

Keywords: Community profiling; Functional profiling; Host trait prediction; Machine learning; Metagenomics; Next generation sequencing; Taxonomic profiling.

MeSH terms

  • Bacteria / classification
  • Bacteria / genetics*
  • Bacteria / isolation & purification
  • Cystic Fibrosis / genetics
  • Cystic Fibrosis / microbiology
  • Cystic Fibrosis Transmembrane Conductance Regulator / genetics
  • DNA, Bacterial / metabolism*
  • Databases, Genetic
  • Gene Expression Profiling*
  • Genome, Bacterial*
  • Humans
  • Lung / microbiology
  • Machine Learning
  • Metagenome*
  • Metagenomics*
  • Mutation
  • Phylogeny
  • Research Design
  • Software
  • Transcriptome*
  • Workflow

Substances

  • CFTR protein, human
  • DNA, Bacterial
  • Cystic Fibrosis Transmembrane Conductance Regulator