Limited utility of residue masking for positive-selection inference

Stephanie J Spielman; Eric T Dawson; Claus O Wilke

doi:10.1093/molbev/msu183

Limited utility of residue masking for positive-selection inference

Mol Biol Evol. 2014 Sep;31(9):2496-500. doi: 10.1093/molbev/msu183. Epub 2014 Jun 3.

Authors

Stephanie J Spielman¹, Eric T Dawson², Claus O Wilke²

Affiliations

¹ Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin stephanie.spielman@gmail.com.
² Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin.

Abstract

Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance's utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance's performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences.

Keywords: alignment filters; multiple sequence alignment; positive-selection inference; sequence simulation.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Computational Biology / methods*
Computer Simulation
Proteins / genetics
Selection, Genetic
Sequence Alignment / methods*
Software

Substances

Proteins

Grants and funding

R01 GM088344/GM/NIGMS NIH HHS/United States