Comparison of data science workflows for root cause analysis of bioprocesses

Bioprocess Biosyst Eng. 2019 Feb;42(2):245-256. doi: 10.1007/s00449-018-2029-6. Epub 2018 Oct 31.

Abstract

Root cause analysis (RCA) is one of the most prominent tools used to comprehensively evaluate a biopharmaceutical production process. Despite of its widespread use in industry, the Food and Drug Administration has observed a lot of unsuitable approaches for RCAs within the last years. The reasons for those unsuitable approaches are the use of incorrect variables during the analysis and the lack in process understanding, which impede correct model interpretation. Two major approaches to perform RCAs are currently dominating the chemical and pharmaceutical industry: raw data analysis and feature-based approach. Both techniques are shown to be able to identify the significant variables causing the variance of the response. Although they are different in data unfolding, the same tools as principal component analysis and partial least square regression are used in both concepts. Within this article we demonstrate the strength and weaknesses of both approaches. We proved that a fusion of both results in a comprehensive and effective workflow, which not only increases better process understanding. We demonstrate this workflow along with an example. Hence, the presented workflow allows to save analysis time and to reduce the effort of data mining by easy detection of the most important variables within the given dataset. Subsequently, the final obtained process knowledge can be translated into new hypotheses, which can be tested experimentally and thereby lead to effectively improving process robustness.

Keywords: Data analysis; Data science; Feature based analysis; Raw data analysis; Root cause analysis.

Publication types

  • Comparative Study

MeSH terms

  • Animals
  • Bioreactors
  • Chlorocebus aethiops
  • Data Science / methods*
  • Drug Industry / trends*
  • Fermentation
  • Multivariate Analysis
  • Poliovirus
  • Principal Component Analysis
  • Regression Analysis
  • Root Cause Analysis*
  • Software
  • Vero Cells
  • Workflow*