Pitfalls in the statistical analysis of microbiome amplicon sequencing data

Hendriek C Boshuizen; Dennis E Te Beest

doi:10.1111/1755-0998.13730

Pitfalls in the statistical analysis of microbiome amplicon sequencing data

Mol Ecol Resour. 2023 Apr;23(3):539-548. doi: 10.1111/1755-0998.13730. Epub 2022 Nov 27.

Authors

Hendriek C Boshuizen¹, Dennis E Te Beest¹

Affiliation

¹ Biometris, Wageningen University and Research, Wageningen, The Netherlands.

PMID: 36330663
DOI: 10.1111/1755-0998.13730

Abstract

Microbiome data are characterized by several aspects that make them challenging to analyse statistically: they are compositional, high dimensional and rich in zeros. A large array of statistical methods exist to analyse these data. Some are borrowed from other fields, such as ecology or RNA-sequencing, while others are custom-made for microbiome data. The large range of available methods, and which is continuously expanding, means that researchers have to invest considerable effort in choosing what method(s) to apply. In this paper we list 14 statistical methods or approaches that we think should be generally avoided. In several cases this is because we believe the assumptions behind the method are unlikely to be met for microbiome data. In other cases we see methods that are used in ways they are not intended to be used. We believe researchers would be helped by more critical evaluations of existing methods, as not all methods in use are suitable or have been sufficiently reviewed. We hope this paper contributes to a critical discussion on what methods are appropriate to use in the analysis of microbiome data.

Keywords: compositional data; microbiome; negative binomial regression; normalization; statistical methods.

MeSH terms

Base Sequence
Microbiota*
RNA, Ribosomal, 16S
Research Design
Sequence Analysis, RNA

Substances

RNA, Ribosomal, 16S

Grants and funding

Internal funding by Wageningen University and Research and Biometris