simpiTB - a pipeline designed to extract meaningful information from whole genome sequencing data of Mycobacterium tuberculosis complex, allows to combine genomic, phylogenetic and clustering analyses in existing SITVIT databases

Infect Genet Evol. 2023 Sep:113:105466. doi: 10.1016/j.meegid.2023.105466. Epub 2023 Jun 16.

Abstract

Data obtained from new sequencing technologies are evolving rapidly, leading to the development of specific bioinformatic tools, pipelines and softwares. Several algorithms and tools are today available allowing a better identification and description of Mycobacterium tuberculosis complex (MTBC) isolates worldwide. Our approach consists in applying existing methods to analyze DNA sequencing data (from FASTA or FASTQ files), and tentatively extract meaningful information that would facilitate identification as well as a better understanding and management of MTBC isolates (taking into account whole genome sequencing and classical genotyping data). The aim of this study is to propose a pipeline analysis allowing to potentially simplify MTBC data analysis by providing different ways to interpret genomic or genotyping information based on existing tools. Furthermore, we propose a "reconciledTB" list making a link with results directly obtained from whole genome sequencing (WGS) data and results obtained from classical genotyping analysis (data inferred from SpoTyping and MIRUReader). Data visualization graphics and trees generated provide additional elements to better understand and confer associations among information overlap analyses. Additionally, comparison between data entered in an international genotyping database (SITVITEXTEND) and ensuing data obtained from the pipeline not only provide meaningful information, but further suggest that simpiTB could also be suitable for new data integration in specific TB genotyping databases.

Keywords: Bioinformatics; Drug resistance; MIRU-VNTRs; Mycobacterium tuberculosis; Spoligotypes; Whole genome sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genomics
  • Humans
  • Mycobacterium tuberculosis*
  • Phylogeny
  • Tuberculosis* / microbiology
  • Whole Genome Sequencing / methods