A data-driven binary-classification framework for oil fingerprinting analysis

Environ Res. 2021 Oct:201:111454. doi: 10.1016/j.envres.2021.111454. Epub 2021 Jun 8.

Abstract

A marine oil spill is one of the most challenging environmental issues, resulting in severe long-term impacts on ecosystems and human society. Oil dispersants are widely applied as a treating agent in oil spill response operations. The usage of dispersants significantly changes the behaviors of dispersed oil and consequently challenges the oil fingerprinting analysis. In this study, machine learning was first introduced to analyze oil fingerprinting by developing a data-driven binary classification framework. The modeling integrated dimensionality reduction algorithms (e.g., principal component analysis, PCA) to distinguish. Five groups of biomarkers, including terpanes, steranes, triaromatic steranes (TA-steranes), monoaromatic steranes (MA-steranes), and diamantanes, were selected. Different feature spaces were created from the diagnostic index of biomarkers, and six ML algorithms were applied for comparative analysis and optimizing the modeling process, including k-nearest neighbor (KNN), support vector classifier (SVC), random forest classifier (RFC), decision tree classifier (DTC), logistic regression classifier (LRC), and ensemble vote classifier (EVC). Hyperparameter optimization and cross-validation through GridSearchCV were applied to prevent overfitting and increase the model accuracy. Model performance was evaluated by model score and F-score through confusion matrices. The results indicated that the RFC algorithm from the diamantanes dataset performed the best. It delivered the highest F-score (0.871) versus the lowest F-score (0.792) from the EVC algorithm from the TA-steranes dataset by PCA with a variance of 95%. Therefore, diamantanes were recommended as the most suitable biomarker for distinguishing WCO and CDO to aid oil fingerprinting under the conditions in this study. The results proved the proposed method as a potential analysis tool for oil spill source identification through ML-aided oil fingerprinting. The study also showed the value of ML methods in oil spill response research and practice.

Keywords: Biomarker; Machine learning algorithms; Oil fingerprinting; Oil spill; Weathering.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Ecosystem*
  • Machine Learning
  • Petroleum Pollution*
  • Principal Component Analysis