In silico screening of ssDNA aptamer against Escherichia coli O157:H7: A machine learning and the Pseudo K-tuple nucleotide composition based approach

Comput Biol Chem. 2021 Dec:95:107568. doi: 10.1016/j.compbiolchem.2021.107568. Epub 2021 Aug 27.

Abstract

This study was planned to in silico screening of ssDNA aptamer against Escherichia coli O157:H7 by combination of machine learning and the PseKNC approach. For this, firstly a total numbers of 47 validated ssDNA aptamers as well as 498 random DNA sequences were considered as positive and negative training data respectively. The sequences then converted to numerical vectors using PseKNC method through Pse-in-one 2.0 web server. After that, the numerical vectors were subjected to classification by the SVM, ANN and RF algorithms available in Orange 3.2.0 software. The performances of the tested models were evaluated using cross-validation, random sampling and ROC curve analyzes. The primary results demonstrated that the ANN and RF algorithms have appropriate performances for the data classification. To improve the performances of mentioned classifiers the positive training data was triplicated and re-training process was also performed. The results confirmed that data size improvement had significant effect on the accuracy of data classification especially about RF model. Subsequently, the RF algorithm with accuracy of 98% was selected for aptamer screening. The thermodynamics details of folding process as well as secondary structures of the screened aptamers were also considered as final evaluations. The results confirmed that the selected aptamers by the proposed method had appropriate structure properties and there is no thermodynamics limit for the aptamers folding.

Keywords: Escherichia coli O157:H7; Machine learning; PseKNC; SsDNA aptamer.

MeSH terms

  • Aptamers, Nucleotide / chemistry
  • Aptamers, Nucleotide / pharmacology*
  • Computational Biology
  • DNA, Single-Stranded / chemistry
  • DNA, Single-Stranded / pharmacology*
  • Drug Evaluation, Preclinical
  • Escherichia coli O157 / drug effects*
  • Machine Learning*
  • Thermodynamics

Substances

  • Aptamers, Nucleotide
  • DNA, Single-Stranded