Learning epistatic interactions from sequence-activity data to predict enantioselectivity

J Comput Aided Mol Des. 2017 Dec;31(12):1085-1096. doi: 10.1007/s10822-017-0090-x. Epub 2017 Dec 12.

Abstract

Enzymes with a high selectivity are desirable for improving economics of chemical synthesis of enantiopure compounds. To improve enzyme selectivity mutations are often introduced near the catalytic active site. In this compact environment epistatic interactions between residues, where contributions to selectivity are non-additive, play a significant role in determining the degree of selectivity. Using support vector machine regression models we map mutations to the experimentally characterised enantioselectivities for a set of 136 variants of the epoxide hydrolase from the fungus Aspergillus niger (AnEH). We investigate whether the influence a mutation has on enzyme selectivity can be accurately predicted through linear models, and whether prediction accuracy can be improved using higher-order counterparts. Comparing linear and polynomial degree = 2 models, mean Pearson coefficients (r) from [Formula: see text]-fold cross-validation increase from 0.84 to 0.91 respectively. Equivalent models tested on interaction-minimised sequences achieve values of [Formula: see text] and [Formula: see text]. As expected, testing on a simulated control data set with no interactions results in no significant improvements from higher-order models. Additional experimentally derived AnEH mutants are tested with linear and polynomial degree = 2 models, with values increasing from [Formula: see text] to [Formula: see text] respectively. The study demonstrates that linear models perform well, however the representation of epistatic interactions in predictive models improves identification of selectivity-enhancing mutations. The improvement is attributed to higher-order kernel functions that represent epistatic interactions between residues.

Keywords: Aspergillus niger; Bioinformatics; Epoxide hydrolase; Fitness; Machine learning; Non-additive; Support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aspergillus niger / enzymology
  • Catalytic Domain*
  • Epoxide Hydrolases*
  • Fungal Proteins
  • Models, Molecular*
  • Mutation
  • Structure-Activity Relationship
  • Substrate Specificity

Substances

  • Fungal Proteins
  • Epoxide Hydrolases