sefOri: selecting the best-engineered sequence features to predict DNA replication origins

Bioinformatics. 2020 Jan 1;36(1):49-55. doi: 10.1093/bioinformatics/btz506.

Abstract

Motivation: Cell divisions start from replicating the double-stranded DNA, and the DNA replication process needs to be precisely regulated both spatially and temporally. The DNA is replicated starting from the DNA replication origins. A few successful prediction models were generated based on the assumption that the DNA replication origin regions have sequence level features like physicochemical properties significantly different from the other DNA regions.

Results: This study proposed a feature selection procedure to further refine the classification model of the DNA replication origins. The experimental data demonstrated that as large as 26% improvement in the prediction accuracy may be achieved on the yeast Saccharomyces cerevisiae. Moreover, the prediction accuracies of the DNA replication origins were improved for all the four yeast genomes investigated in this study.

Availability and implementation: The software sefOri version 1.0 was available at http://www.healthinformaticslab.org/supp/resources.php. An online server was also provided for the convenience of the users, and its web link may be found in the above-mentioned web page.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA / chemistry
  • DNA Replication* / genetics
  • Models, Genetic*
  • Replication Origin* / genetics
  • Saccharomyces cerevisiae / genetics
  • Sequence Analysis, DNA* / methods

Substances

  • DNA