Predicting host traits from metagenomes presents new challenges that can be difficult to overcome for researchers without a strong background in bioinformatics and/or statistics. Profiling bacterial communities using shotgun metagenomics often leads to the generation of a large amount of data that cannot be used directly for training a model. In this chapter we provide a detailed description of how to build a working machine learning model based on taxonomic and functional features of bacterial communities inhabiting the lungs of cystic fibrosis patients. Models are built in the R environment by using different freely available machine learning algorithms.
Keywords: Community profiling; Functional profiling; Host trait prediction; Machine learning; Metagenomics; Next generation sequencing; Taxonomic profiling.