Expectation-Maximization Model for Substitution of Missing Values Characterizing Greenness of Organic Solvents

Molecules. 2018 May 28;23(6):1292. doi: 10.3390/molecules23061292.

Abstract

Organic solvents are ubiquitous in chemical laboratories and the Green Chemistry trend forces their detailed assessments in terms of greenness. Unfortunately, some of them are not fully characterized, especially in terms of toxicological endpoints that are time consuming and expensive to be determined. Missing values in the datasets are serious obstacles, as they prevent the full greenness characterization of chemicals. A featured method to deal with this problem is the application of Expectation-Maximization algorithm. In this study, the dataset consists of 155 solvents that are characterized by 13 variables is treated with Expectation-Maximization algorithm to predict missing data for toxicological endpoints, bioavailability, and biodegradability data. The approach may be particularly useful for substitution of missing values of environmental, health, and safety parameters of new solvents. The presented approach has high potential to deal with missing values, while assessing environmental, health, and safety parameters of other chemicals.

Keywords: E-M algorithm; green analytical chemistry; missing data prediction; solvents; sustainability assessment.

MeSH terms

  • Algorithms
  • Green Chemistry Technology / methods*
  • Molecular Structure
  • Solvents / chemistry*

Substances

  • Solvents