A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

Ali Hakimzadeh; Alejandro Abdala Asbun; Davide Albanese; Maria Bernard; Dominik Buchner; Benjamin Callahan; J Gregory Caporaso; Emily Curd; Christophe Djemiel; Mikael Brandström Durling; Vasco Elbrecht; Zachary Gold; Hyun S Gweon; Mehrdad Hajibabaei; Falk Hildebrand; Vladimir Mikryukov; Eric Normandeau; Ezgi Özkurt; Jonathan M Palmer; Géraldine Pascal; Teresita M Porter; Daniel Straub; Martti Vasar; Tomáš Větrovský; Haris Zafeiropoulos; Sten Anslan

doi:10.1111/1755-0998.13847

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses

Mol Ecol Resour. 2023 Aug 7:10.1111/1755-0998.13847. doi: 10.1111/1755-0998.13847. Online ahead of print.

Authors

Ali Hakimzadeh¹, Alejandro Abdala Asbun², Davide Albanese³, Maria Bernard^{4

5}, Dominik Buchner⁶, Benjamin Callahan⁷, J Gregory Caporaso⁸, Emily Curd⁹, Christophe Djemiel¹⁰, Mikael Brandström Durling¹¹, Vasco Elbrecht⁶, Zachary Gold¹², Hyun S Gweon^{13

14}, Mehrdad Hajibabaei¹⁵, Falk Hildebrand^{16

17}, Vladimir Mikryukov¹, Eric Normandeau¹⁸, Ezgi Özkurt^{16

17}, Jonathan M Palmer¹⁹, Géraldine Pascal²⁰, Teresita M Porter¹⁵, Daniel Straub²¹, Martti Vasar¹, Tomáš Větrovský²², Haris Zafeiropoulos²³, Sten Anslan^{1

24}

Affiliations

¹ Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia.
² Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, Texel, Netherlands.
³ Unit of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach, Italy.
⁴ INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas, France.
⁵ INRAE, SIGENAE, Jouy-en-Josas, France.
⁶ Aquatic Ecosystem Research, University of Duisburg-Essen, Essen, Germany.
⁷ Department of Population Health and Pathobiology, College of Veterinary Medicine and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA.
⁸ Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, USA.
⁹ Vermont Biomedical Research Network, University of Vermont, Burlington, Vermont, USA.
¹⁰ Agroécologie, INRAE, Institut Agro, Univ. Bourgogne Franche-Comté, Dijon, France.
¹¹ Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Uppsala, Sweden.
¹² Zachary Gold, NOAA Pacific Marine Environmental Laboratory, Seattle, Washington, USA.
¹³ UK Centre for Ecology & Hydrology, Oxfordshire, UK.
¹⁴ School of Biological Sciences, University of Reading, Reading, UK.
¹⁵ Department of Integrative Biology and Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada.
¹⁶ Gut Microbes & Health, Quadram Institute Bioscience, Norfolk, UK.
¹⁷ Earlham Institute, Norwich Research Park, Norfolk, UK.
¹⁸ Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, Québec, Canada.
¹⁹ Center for Forest Mycology Research, Northern Research Station, US Forest Service, Madison, Wisconsin, USA.
²⁰ GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France.
²¹ Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany.
²² Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Praha, Czech Republic.
²³ KU Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, Leuven, Belgium.
²⁴ Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland.

PMID: 37548515
PMCID: PMC10847385 (available on 2025-02-07)
DOI: 10.1111/1755-0998.13847

Abstract

Environmental DNA (eDNA) metabarcoding has gained growing attention as a strategy for monitoring biodiversity in ecology. However, taxa identifications produced through metabarcoding require sophisticated processing of high-throughput sequencing data from taxonomically informative DNA barcodes. Various sets of universal and taxon-specific primers have been developed, extending the usability of metabarcoding across archaea, bacteria and eukaryotes. Accordingly, a multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirty-two amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.

Keywords: amplicon data analysis; bioinformatics; environmental DNA; metabarcoding; pipeline; review.

Abstract

Grants and funding