Integrated knowledge mining, genome-scale modeling, and machine learning for predicting Yarrowia lipolytica bioproduction

Metab Eng. 2021 Sep:67:227-236. doi: 10.1016/j.ymben.2021.07.003. Epub 2021 Jul 7.

Abstract

Predicting bioproduction titers from microbial hosts has been challenging due to complex interactions between microbial regulatory networks, stress responses, and suboptimal cultivation conditions. This study integrated knowledge mining, feature extraction, genome-scale modeling (GSM), and machine learning (ML) to develop a model for predicting Yarrowia lipolytica chemical titers (i.e., organic acids, terpenoids, etc.). First, Y. lipolytica production data, including cultivation conditions, genetic engineering strategies, and product information, was manually collected from literature (~100 papers) and stored as either numerical (e.g., substrate concentrations) or categorical (e.g., bioreactor modes) variables. For each case recorded, central pathway fluxes were estimated using GSMs and flux balance analysis (FBA) to provide metabolic features. Second, a ML ensemble learner was trained to predict strain production titers. Accurate predictions on the test data were obtained for instances with production titers >1 g/L (R2 = 0.87). However, the model had reduced predictability for low performance strains (0.01-1 g/L, R2 = 0.29) potentially due to biosynthesis bottlenecks not captured in the features. Feature ranking indicated that the FBA fluxes, the number of enzyme steps, the substrate inputs, and thermodynamic barriers (i.e., Gibbs free energy of reaction) were the most influential factors. Third, the model was evaluated on other oleaginous yeasts and indicated there were conserved features for some hosts that can be potentially exploited by transfer learning. The platform was also designed to assist computational strain design tools (such as OptKnock) to screen genetic targets for improved microbial production in light of experimental conditions.

Keywords: Computational strain design; FBA; Machine learning; Pathway bottlenecks; Yarrowia lipolytica.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Machine Learning
  • Metabolic Engineering
  • Terpenes
  • Yarrowia* / genetics

Substances

  • Terpenes