Quality Control in Metagenomics Data

Methods Mol Biol. 2023:2649:21-54. doi: 10.1007/978-1-0716-3072-3_2.

Abstract

Experiments involving metagenomics data are become increasingly commonplace. Processing such data requires a unique set of considerations. Quality control of metagenomics data is critical to extracting pertinent insights. In this chapter, we outline some considerations in terms of study design and other confounding factors that can often only be realized at the point of data analysis.In this chapter, we outline some basic principles of quality control in metagenomics, including overall reproducibility and some good practices to follow. The general quality control of sequencing data is then outlined, and we introduce ways to process this data by using bash scripts and developing pipelines in Snakemake (Python).A significant part of quality control in metagenomics is in analyzing the data to ensure you can spot relationships between variables and to identify when they might be confounded. This chapter provides a walkthrough of analyzing some microbiome data (in the R statistical language) and demonstrates a few days to identify overall differences and similarities in microbiome data. The chapter is concluded by discussing remarks about considering taxonomic results in the context of the study and interrogating sequence alignments using the command line.

Keywords: Metagenomics; Microbial bioinformatics contamination; Microbiome bacteria; Quality control data; Virus.

MeSH terms

  • Computational Biology / methods
  • Metagenomics* / methods
  • Microbiota*
  • Reproducibility of Results
  • Research Design