Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction

Comput Struct Biotechnol J. 2016 Jul 27:14:298-303. doi: 10.1016/j.csbj.2016.07.002. eCollection 2016.

Abstract

In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

Keywords: Ab initio gene prediction; Compositional properties; Eukaryotes; Functional signals; Sequence features.

Publication types

  • Review