Predicting reliable regions in protein alignments from sequence profiles

Michael L Tress; David Jones; Alfonso Valencia

doi:10.1016/s0022-2836(03)00622-3

Predicting reliable regions in protein alignments from sequence profiles

J Mol Biol. 2003 Jul 18;330(4):705-18. doi: 10.1016/s0022-2836(03)00622-3.

Authors

Michael L Tress¹, David Jones, Alfonso Valencia

Affiliation

¹ Protein Design Group, Centro Nacional de Biotechnologia, CNB-CSIC, Cantoblanco, 28049 Madrid, Spain. mtress@cnb.uam.es

PMID: 12850141
DOI: 10.1016/s0022-2836(03)00622-3

Abstract

For applications such as comparative modelling one major issue is the reliability of sequence alignments. Reliable regions in alignments can be predicted using sub-optimal alignments of the same pair of sequences. Here we show that reliable regions in alignments can also be predicted from multiple sequence profile information alone. Alignments were created for a set of remotely related pairs of proteins using five different test methods. Structural alignments were used to assess the quality of the alignments and the aligned positions were scored using information from the observed frequencies of amino acid residues in sequence profiles pre-generated for each template structure. High-scoring regions of these profile-derived alignment scores were a good predictor of reliably aligned regions. These profile-derived alignment scores are easy to obtain and are applicable to any alignment method. They can be used to detect those regions of alignments that are reliably aligned and to help predict the quality of an alignment. For those residues within secondary structure elements, the regions predicted as reliably aligned agreed with the structural alignments for between 92% and 97.4% of the residues. In loop regions just under 92% of the residues predicted to be reliable agreed with the structural alignments. The percentage of residues predicted as reliable ranged from 32.1% for helix residues to 52.8% for strand residues. This information could also be used to help predict conserved binding sites from sequence alignments. Residues in the template that were identified as binding sites, that aligned to an identical amino acid residue and where the sequence alignment agreed with the structural alignment were in highly conserved, high scoring regions over 80% of the time. This suggests that many binding sites that are present in both target and template sequences are in sequence-conserved regions and that there is the possibility of translating reliability to binding site prediction.

MeSH terms

Algorithms
Amino Acid Sequence
Amino Acids / chemistry
Binding Sites
Conserved Sequence
Models, Molecular
Molecular Sequence Data
Protein Structure, Secondary
Proteins / chemistry*
Sequence Homology, Amino Acid
Software

Substances

Amino Acids
Proteins