In-depth comparative analysis of Illumina® MiSeq run metrics: Development of a wet-lab quality assessment tool

Mol Ecol Resour. 2019 Mar;19(2):377-387. doi: 10.1111/1755-0998.12973. Epub 2019 Jan 17.

Abstract

Whole genome sequencing of bacterial isolates has become a daily task in many laboratories, generating incredible amounts of data. However, data acquisition is not an end in itself; the goal is to acquire high-quality data useful for understanding genetic relationships. Having a method that could rapidly determine which of the many available run metrics are the most important indicators of overall run quality and having a way to monitor these during a given sequencing run would be extremely helpful to this effect. Therefore, we compared various run metrics across 486 MiSeq runs, from five different machines. By performing a statistical analysis using principal components analysis and a K-means clustering algorithm of the metrics, we were able to validate metric comparisons among instruments, allowing for the development of a predictive algorithm, which permits one to observe whether a given MiSeq run has performed adequately. This algorithm is available in an Excel spreadsheet: that is, MiSeq Instrument & Run (In-Run) Forecast. Our tool can help verify that the quantity/quality of the generated sequencing data consistently meets or exceeds recommended manufacturer expectations. Patterns of deviation from those expectations can be used to assess potential run problems and plan preventative maintenance, which can save valuable time and funding resources.

Keywords: Forecast; In-Run; MiSeq; sequencing; tool.

MeSH terms

  • Algorithms
  • Bacteria / genetics*
  • Genome, Bacterial*
  • High-Throughput Nucleotide Sequencing / methods*
  • High-Throughput Nucleotide Sequencing / standards*
  • Models, Statistical
  • Quality Control*
  • Whole Genome Sequencing / methods*
  • Whole Genome Sequencing / standards*