Developing high-dimensional machine learning models to improve generalization ability and overcome data insufficiency for mixed sugar fermentation simulation

Bioresour Technol. 2023 Oct:385:129375. doi: 10.1016/j.biortech.2023.129375. Epub 2023 Jun 22.

Abstract

Biorefinery can be promoted by building accurate machine learning models. This work proposed a strategy to enhance model's generalization ability and overcome insufficient data conditions for mixed sugar fermentation simulation. Multiple inputs single output models, using initial glucose, initial xylose, and time together as inputs, have higher generalization ability than single input single output models with time as sole input in predicting glucose, xylose, ethanol, or biomass separately. Multiple inputs multiple outputs models, integrating outputs, enhanced model accuracy and resulted in an average R2 at 0.99. To overcome data insufficiency conditions, consensus yeast (CY) model, through consolidating data from 4 yeasts, obtained R2 at 0.90. By adjusting the pretrained CY model, the model can save more than 50% data and get R2 at 0.95 and 0.93 for yeast and bacterial fermentation simulation. The strategy can expand the application range and save costs of data curation for ANN models.

Keywords: Artificial neural network; Dimensionality; Insufficient data; Kinetic model; Machine learning; Mixed sugar fermentation.

MeSH terms

  • Fermentation
  • Glucose
  • Machine Learning
  • Saccharomyces cerevisiae*
  • Xylose*

Substances

  • Xylose
  • Glucose