In-depth comparative analysis of Illumina® MiSeq run metrics: Development of a wet-lab quality assessment tool

George John Kastanis; Luis V Santana-Quintero; Maria Sanchez-Leon; Sara Lomonaco; Eric W Brown; Marc W Allard

doi:10.1111/1755-0998.12973

In-depth comparative analysis of Illumina^® MiSeq run metrics: Development of a wet-lab quality assessment tool

Mol Ecol Resour. 2019 Mar;19(2):377-387. doi: 10.1111/1755-0998.12973. Epub 2019 Jan 17.

Authors

George John Kastanis¹, Luis V Santana-Quintero², Maria Sanchez-Leon¹, Sara Lomonaco^{1

3}, Eric W Brown¹, Marc W Allard¹

Affiliations

¹ Department of Microbiology, Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, Maryland.
² Office of Hematology and Oncology Products, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland.
³ Department of Veterinary Sciences, Università degli Studi di Torino, Grugliasco, Turin, Italy.

Abstract

Whole genome sequencing of bacterial isolates has become a daily task in many laboratories, generating incredible amounts of data. However, data acquisition is not an end in itself; the goal is to acquire high-quality data useful for understanding genetic relationships. Having a method that could rapidly determine which of the many available run metrics are the most important indicators of overall run quality and having a way to monitor these during a given sequencing run would be extremely helpful to this effect. Therefore, we compared various run metrics across 486 MiSeq runs, from five different machines. By performing a statistical analysis using principal components analysis and a K-means clustering algorithm of the metrics, we were able to validate metric comparisons among instruments, allowing for the development of a predictive algorithm, which permits one to observe whether a given MiSeq run has performed adequately. This algorithm is available in an Excel spreadsheet: that is, MiSeq Instrument & Run (In-Run) Forecast. Our tool can help verify that the quantity/quality of the generated sequencing data consistently meets or exceeds recommended manufacturer expectations. Patterns of deviation from those expectations can be used to assess potential run problems and plan preventative maintenance, which can save valuable time and funding resources.

Keywords: Forecast; In-Run; MiSeq; sequencing; tool.

MeSH terms

Algorithms
Bacteria / genetics*
Genome, Bacterial*
High-Throughput Nucleotide Sequencing / methods*
High-Throughput Nucleotide Sequencing / standards*
Models, Statistical
Quality Control*
Whole Genome Sequencing / methods*
Whole Genome Sequencing / standards*

Abstract

MeSH terms

Grants and funding