SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models

BMC Bioinformatics. 2023 Jul 18;24(1):287. doi: 10.1186/s12859-023-05407-9.

Abstract

Background: Next-generation sequencing technologies yield large numbers of genetic alterations, of which a subset are missense variants that alter an amino acid in the protein product. These variants can have a potentially destabilizing effect leading to an increased risk of misfolding and aggregation. Multiple software tools exist to predict the effect of single-nucleotide variants on proteins, however, a pipeline integrating these tools while starting from an NGS data output list of variants is lacking.

Results: The previous version SNPeffect 4.0 (De Baets in Nucleic Acids Res 40(D1):D935-D939, 2011) provided an online database containing pre-calculated variant effects and low-throughput custom variant analysis. Here, we built an automated and parallelized pipeline that analyzes the impact of missense variants on the aggregation propensity and structural stability of proteins starting from the Variant Call Format as input. The pipeline incorporates the AlphaFold Protein Structure Database to achieve high coverage for structural stability analyses using the FoldX force field. The effect on aggregation-propensity is analyzed using the established predictors TANGO and WALTZ. The pipeline focuses solely on the human proteome and can be used to analyze proteome stability/damage in a given sample based on sequencing results.

Conclusion: We provide a bioinformatics pipeline that allows structural phenotyping from sequencing data using established stability and aggregation predictors including FoldX, TANGO, and WALTZ; and structural proteome coverage provided by the AlphaFold database. The pipeline and installation guide are freely available for academic users on https://github.com/vibbits/snpeffect and requires a computer cluster.

Keywords: AlphaFold; Coding missense variants; Protein aggregation; Protein stability; SNPeffect; Single nucleotide variants.

MeSH terms

  • Databases, Protein
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Mutant Proteins
  • Mutation
  • Proteome*
  • Software*

Substances

  • Proteome
  • Mutant Proteins