The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed-batch fermentations

Biotechnol J. 2015 Sep;10(11):1770-82. doi: 10.1002/biot.201400790. Epub 2015 Aug 11.

Abstract

Product quality assurance strategies in production of biopharmaceuticals currently undergo a transformation from empirical "quality by testing" to rational, knowledge-based "quality by design" approaches. The major challenges in this context are the fragmentary understanding of bioprocesses and the severely limited real-time access to process variables related to product quality and quantity. Data driven modeling of process variables in combination with model predictive process control concepts represent a potential solution to these problems. The selection of statistical techniques best qualified for bioprocess data analysis and modeling is a key criterion. In this work a series of recombinant Escherichia coli fed-batch production processes with varying cultivation conditions employing a comprehensive on- and offline process monitoring platform was conducted. The applicability of two machine learning methods, random forest and neural networks, for the prediction of cell dry mass and recombinant protein based on online available process parameters and two-dimensional multi-wavelength fluorescence spectroscopy is investigated. Models solely based on routinely measured process variables give a satisfying prediction accuracy of about ± 4% for the cell dry mass, while additional spectroscopic information allows for an estimation of the protein concentration within ± 12%. The results clearly argue for a combined approach: neural networks as modeling technique and random forest as variable selection tool.

Keywords: Artificial neural networks; Process analytical technology; Quality by design; Random forest; Recombinant protein production.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomass*
  • Bioreactors
  • Decision Trees
  • Escherichia coli / genetics
  • Escherichia coli / metabolism*
  • Fermentation
  • Models, Statistical*
  • Neural Networks, Computer*
  • Protein Engineering / methods*
  • Recombinant Proteins / metabolism*

Substances

  • Recombinant Proteins