A new signal characterization and signal-based Chou's PseAAC representation of protein sequences

J Bioinform Comput Biol. 2015 Oct;13(5):1550024. doi: 10.1142/S0219720015500249. Epub 2015 Aug 21.

Abstract

Most of the algorithms used for information extraction and for processing the amino acid chains that make up proteins treat them as symbolic chains. Fewer algorithms exploit signal processing techniques that require a numerical representation of amino acid chains. However, these algorithms are very powerful for extracting regularities that cannot be detected when working with a symbolic chain, which may be important for understanding the biological meaning of a sequence or in classification tasks. In this study, a new mathematical representation of amino acid chains is proposed, which is derived using a similarity measure based on the PAM250 amino acid substitution matrix and that generates 20 signals for each protein sequence. Using this representation 20 consensus spectra for a protein family are determined and the relevance of the frequency peaks is established, obtaining a group of significant frequency peaks that manifest common periodicities of the amino acid sequences that belong to a protein family. We also show that the proposed representation in 20 signals can be integrated into Chou's pseudo amino acid composition (PseAAC) and constitute a useful alternative to amino acid physicochemical properties in Chou's PseAAC.

Keywords: Amino acid chain; mathematical representation; protein family; pseudo amino acid composition; signal processing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Amino Acid Substitution
  • Amino Acids / chemistry
  • Computational Biology / methods
  • Consensus Sequence
  • Databases, Protein / statistics & numerical data
  • Markov Chains
  • Proteins / chemistry*

Substances

  • Amino Acids
  • Proteins