Visualization and Analysis of the REACH-chemical Space with Generative Topographic Mapping

Mol Inform. 2021 Apr;40(4):e2000232. doi: 10.1002/minf.202000232. Epub 2020 Nov 24.

Abstract

In the framework of REACH (Registration Evaluation Authorization and restriction of Chemicals) regulation, industries have generated and reported a huge amount of (eco)toxicological data on substance produced or imported in Europe. The registration procedure initiated the creation of a large REACH database of well defined (eco)toxicological properties. Here, the data distribution in the REACH chemical space was analyzed with the help of the Generative Topographic Mapping (GTM) approach. GTM generates 2-dimensional maps on which each compound is represented as a data point. The 3rd dimension can be used in order to display a distribution of the given (eco)toxicological property, which can further be used for property assessment of new compounds projected on the map. We report the "Universal REACH map" which accommodates 11 endpoints, covering environmental fate and (eco)toxicological properties. This map demonstrates acceptable predictive performance: in cross-validation, balanced accuracy ranges from 0.60 to 0.78. The 11 endpoints profile has been computed for each REACH-registered substance. Some concerns related to acute aquatic toxicity have been identified, whereas for environmental fate and human health endpoints the amount of compounds predicted as of concern was much smaller. It has been demonstrated that superposition of several class landscapes allows to select the zones in the chemical space populated by compounds with a given (eco)toxicological profile.

Keywords: Generative Topographic Mapping (GTM); REACH chemical space; ecotoxicology; environmental fate; visualization.

MeSH terms

  • Algorithms
  • Animals
  • Databases, Factual
  • Humans
  • Models, Molecular
  • Molecular Structure
  • Organic Chemicals / analysis*
  • Organic Chemicals / toxicity
  • Rats

Substances

  • Organic Chemicals