PredictION: a predictive model to establish the performance of Oxford sequencing reads of SARS-CoV-2

PeerJ. 2022 Nov 30:10:e14425. doi: 10.7717/peerj.14425. eCollection 2022.

Abstract

The optimization of resources for research in developing countries forces us to consider strategies in the wet lab that allow the reuse of molecular biology reagents to reduce costs. In this study, we used linear regression as a method for predictive modeling of coverage depth given the number of MinION reads sequenced to define the optimum number of reads necessary to obtain >200X coverage depth with a good lineage-clade assignment of SARS-CoV-2 genomes. The research aimed to create and implement a model based on machine learning algorithms to predict different variables (e.g., coverage depth) given the number of MinION reads produced by Nanopore sequencing to maximize the yield of high-quality SARS-CoV-2 genomes, determine the best sequencing runtime, and to be able to reuse the flow cell with the remaining nanopores available for sequencing in a new run. The best accuracy was -0.98 according to the R squared performance metric of the models. A demo version is available at https://genomicdashboard.herokuapp.com/.

Keywords: Genomes; Linear models; Machine learning; Oxford nanopore technologies; Sequences.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Genome
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • SARS-CoV-2* / genetics
  • Sequence Analysis, DNA / methods