Quality Control in Metagenomics Data

Abraham Gihawi; Ryan Cardenas; Rachel Hurst; Daniel S Brewer

doi:10.1007/978-1-0716-3072-3_2

Quality Control in Metagenomics Data

Methods Mol Biol. 2023:2649:21-54. doi: 10.1007/978-1-0716-3072-3_2.

Authors

Abraham Gihawi¹, Ryan Cardenas¹, Rachel Hurst¹, Daniel S Brewer^{2

3}

Affiliations

¹ Bob Champion Research & Education Building, Norwich Medical School, University of East Anglia, Norwich, UK.
² Bob Champion Research & Education Building, Norwich Medical School, University of East Anglia, Norwich, UK. D.Brewer@uea.ac.uk.
³ Earlham Institute, Norwich Research Park, Norwich, UK. D.Brewer@uea.ac.uk.

PMID: 37258856
DOI: 10.1007/978-1-0716-3072-3_2

Abstract

Experiments involving metagenomics data are become increasingly commonplace. Processing such data requires a unique set of considerations. Quality control of metagenomics data is critical to extracting pertinent insights. In this chapter, we outline some considerations in terms of study design and other confounding factors that can often only be realized at the point of data analysis.In this chapter, we outline some basic principles of quality control in metagenomics, including overall reproducibility and some good practices to follow. The general quality control of sequencing data is then outlined, and we introduce ways to process this data by using bash scripts and developing pipelines in Snakemake (Python).A significant part of quality control in metagenomics is in analyzing the data to ensure you can spot relationships between variables and to identify when they might be confounded. This chapter provides a walkthrough of analyzing some microbiome data (in the R statistical language) and demonstrates a few days to identify overall differences and similarities in microbiome data. The chapter is concluded by discussing remarks about considering taxonomic results in the context of the study and interrogating sequence alignments using the command line.

Keywords: Metagenomics; Microbial bioinformatics contamination; Microbiome bacteria; Quality control data; Virus.

MeSH terms

Computational Biology / methods
Metagenomics* / methods
Microbiota*
Reproducibility of Results
Research Design