The study of microbial communities crucially relies on the comparison of metagenomic next-generation sequencing data sets, for which several methods have been designed in recent years. Here, we review three key challenges in the comparison of such data sets: species identification and quantification, the efficient computation of distances between metagenomic samples and the identification of metagenomic features associated with a phenotype such as disease status. We present current solutions for such challenges, considering both reference-based methods relying on a database of reference genomes and reference-free methods working directly on all sequencing reads from the samples.
Keywords: metagenomics; microbiome; next-generation sequencing.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.