Merging and concatenation of sequencing reads: a bioinformatics workflow for the comprehensive profiling of microbiome from amplicon data

FEMS Microbiol Lett. 2024 Jan 9:371:fnae009. doi: 10.1093/femsle/fnae009.

Abstract

A comprehensive profiling of microbial diversity is essential to understand the ecosystem functions. Universal primer sets such as the 515Y/926R could amplify a part of 16S and 18S rRNA and infer the diversity of prokaryotes and eukaryotes. However, the analyses of mixed sequencing data pose a bioinformatics challenge; the 16S and 18S rRNA sequences need to be separated first and analysed individually/independently due to variations in the amplicon length. This study describes an alternative strategy, a merging and concatenation workflow, to analyse the mixed amplicon data without separating the 16S and 18S rRNA sequences. The workflow was tested with 24 mock community (MC) samples, and the analyses resolved the composition of prokaryotes and eukaryotes adequately. In addition, there was a strong correlation (cor = 0.950; P-value = 4.754e-10) between the observed and expected abundances in the MC samples, which suggests that the computational approach could infer the microbial proportions accurately. Further, 18 samples collected from the Sundarbans mangrove region were analysed as a case study. The analyses identified Proteobacteria, Bacteroidota, Actinobacteriota, Cyanobacteria, and Crenarchaeota as dominant bacterial phyla and eukaryotic divisions such as Metazoa, Gyrista, Cryptophyta, Chlorophyta, and Dinoflagellata were found to be dominant in the samples. Thus, the results support the applicability of the method in environmental microbiome research. The merging and concatenation workflow presented here requires considerably less computational resources and uses widely/commonly used bioinformatics packages, saving researchers analyses time (for equivalent sample numbers, compared to the conventional approach) required to infer the diversity of major microbial domains from mixed amplicon data at comparable accuracy.

Keywords: bioinformatics pipeline; environmental microbiome; eukaryotes; microbial ecology; mixed amplicon data; prokaryotes.

MeSH terms

  • Bacteria / genetics
  • Computational Biology
  • High-Throughput Nucleotide Sequencing / methods
  • Microbiota* / genetics
  • RNA, Ribosomal, 16S / genetics
  • RNA, Ribosomal, 18S / genetics
  • Workflow

Substances

  • RNA, Ribosomal, 18S
  • RNA, Ribosomal, 16S