Etemadi regression in chemometrics: Reliability-based procedures for modeling and forecasting

Heliyon. 2024 Feb 15;10(5):e26399. doi: 10.1016/j.heliyon.2024.e26399. eCollection 2024 Mar 15.

Abstract

The creation of predictive models with a high degree of generalizability in chemical analysis and process optimization is of paramount importance. Nonetheless, formulating a prediction model based on collected data from chemical measurements that maximize quantitative generalizability remains a challenging task for chemometrics experts. To tackle this challenge, a range of forecasting models with varying characteristics, structures, and capabilities has been developed, utilizing either accuracy-based or reliability-based modeling strategies. While the majority of models follow the accuracy-based approach, a recently proposed reliability-based approach, known as the Etemadi approach, has shown impressive performance across various scientific fields. The Etemadi models were constructed through a reliability-based parameter estimation process in such a manner that maximizes the models' reliability. However, the foundation of modeling procedures for chemometrics purposes is built upon the assumption that high generalizability in inaccessible/test data is achieved through the accuracy-based training procedure in which errors in available historical/training data are minimized. After conducting a thorough review of the current literature, we have found that none of the forecasting models for chemometrics purposes incorporate reliability into their modeling procedures. Given the dynamic and highly sensitive nature of chemistry experiments and processes, implementing a reliable model that controls performance criteria variation is a promising strategy for achieving stable and robust forecasts. To address this research gap, this paper introduces several key innovations, which can be highlighted as follows: (1) Proposing a general design structure based on a new optimal reliability-based parameter estimation process. (2) Introducing a novel risk-based modeling strategy that minimizes the performance variation of models implemented under different conditions in chemical laboratory experiments, to generate a more generalizable model for diverse applications in chemometrics. (3) Specifying the degree of influence that each reliability and accuracy factor has in enhancing the generalizability and uncertainty modeling of chemometric models. Empirical evidence confirms the effectiveness and superior performance of reliability-based models compared to accuracy-based models in 78.95% of the cases across various fields, including Pharmacology, Biochemistry, Agrochemical, Geochemical, Biological, Pollutants, Physicochemical Properties, and Gases Experiment. Furthermore, the study's findings demonstrate that the reliability-based modeling approach outperforms the accuracy-based strategy in terms of MAE, MSE, ARV, and RMSE by an average of 4.697%, 5.646%, 5.646%, and 4.342%, respectively. It is also statistically proven that reliability has a more significant impact on improving the generalizability of chemometric models than accuracy. This emphasizes the importance of including reliability as a crucial factor in chemometrics modeling, a consideration that has been overlooked in traditional modeling processes. Consequently, reliability-based modeling approaches can be regarded as a viable alternative to conventional accuracy-based modeling methods for chemical modeling purposes.

Keywords: Accuracy and reliability-based modeling strategies; Chemometrics; Forecasting and modeling processes; Generalization capability; Multiple linear regression.