Ensemble machine learning prediction of anaerobic co-digestion of manure and thermally pretreated harvest residues

Bioresour Technol. 2024 May 3:402:130793. doi: 10.1016/j.biortech.2024.130793. Online ahead of print.

Abstract

This study aimed to clarify the statistical accuracy assessment approaches used in recent biogas prediction studies using state-of-the-art ensemble machine learning approach according to 10-fold cross-validation in 100 repetitions. Three thermally pretreated harvest residue types (maize stover, sunflower stalk and soybean straw) and manure were anaerobically co-digested, measuring biogas and methane yield alongside eight thermal preprocessing and biomass covariates. These were the inputs to an ensemble machine learning approach for biogas and methane yield prediction, employing three feature selection approaches. The Support Vector Machine prediction with the Recursive Feature Elimination resulted in the highest prediction accuracy, achieving the coefficient of determination of 0.820 and 0.823 for biogas and methane yield prediction, respectively. This study demonstrated an extreme dependency of prediction accuracy to input dataset properties, which could only be mitigated with ensemble machine learning and strongly suggested that the split-sample approach, often used in previous studies, should be avoided.

Keywords: Biogas yield; Feature selection; Lignocellulose pretreatment; Methane yield; k-fold cross-validation.