Machine-learning model selection and parameter estimation from kinetic data of complex first-order reaction systems

PLoS One. 2021 Aug 9;16(8):e0255675. doi: 10.1371/journal.pone.0255675. eCollection 2021.

Abstract

Dealing with a system of first-order reactions is a recurrent issue in chemometrics, especially in the analysis of data obtained by spectroscopic methods applied on complex biological systems. We argue that global multiexponential fitting, the still common way to solve such problems, has serious weaknesses compared to contemporary methods of sparse modeling. Combining the advantages of group lasso and elastic net-the statistical methods proven to be very powerful in other areas-we created an optimization problem tunable from very sparse to very dense distribution over a large pre-defined grid of time constants, fitting both simulated and experimental multiwavelength spectroscopic data with high computational efficiency. We found that the optimal values of the tuning hyperparameters can be selected by a machine-learning algorithm based on a Bayesian optimization procedure, utilizing widely used or novel versions of cross-validation. The derived algorithm accurately recovered the true sparse kinetic parameters of an extremely complex simulated model of the bacteriorhodopsin photocycle, as well as the wide peak of hypothetical distributed kinetics in the presence of different noise levels. It also performed well in the analysis of the ultrafast experimental fluorescence kinetics data detected on the coenzyme FAD in a very wide logarithmic time window. We conclude that the primary application of the presented algorithms-implemented in available software-covers a wide area of studies on light-induced physical, chemical, and biological processes carried out with different spectroscopic methods. The demand for this kind of analysis is expected to soar due to the emerging ultrafast multidimensional infrared and electronic spectroscopic techniques that provide very large and complex datasets. In addition, simulations based on our methods could help in designing the technical parameters of future experiments for the verification of particular hypothetical models.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteriorhodopsins / metabolism
  • Bayes Theorem
  • Computational Biology / methods*
  • Computer Simulation
  • Data Analysis*
  • Flavin-Adenine Dinucleotide / metabolism
  • Kinetics
  • Machine Learning*
  • Models, Biological*
  • Software

Substances

  • Flavin-Adenine Dinucleotide
  • Bacteriorhodopsins

Grants and funding

LZ, FS, NZ, GG; GINOP-2.3.2-15-2016-00001; Economic Development and Innovation Operative Programme of Hungary; https://www.palyazat.gov.hu/node/56577 LZ; K-124922; National Research, Development and Innovation Office of Hungary; https://nkfih.gov.hu AS; NKFIH PD-121170; National Research, Development and Innovation Office of Hungary; https://nkfih.gov.hu LZ, GG; 2018-1.2.1-NKP-2018-00009; National Research, Development and Innovation Office of Hungary; https://nkfih.gov.hu The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.