Addressing Noise and Estimating Uncertainty in Biomedical Data through the Exploration of Chemical Space

Int J Mol Sci. 2022 Oct 26;23(21):12975. doi: 10.3390/ijms232112975.

Abstract

Noise is a basic ingredient in data, since observed data are always contaminated by unwanted deviations, i.e., noise, which, in the case of overdetermined systems (with more data than model parameters), cause the corresponding linear system of equations to have an imperfect solution. In addition, in the case of highly underdetermined parameterization, noise can be absorbed by the model, generating spurious solutions. This is a very undesirable situation that might lead to incorrect conclusions. We presented mathematical formalism based on the inverse problem theory combined with artificial intelligence methodologies to perform an enhanced sampling of noisy biomedical data to improve the finding of meaningful solutions. Random sampling methods fail for high-dimensional biomedical problems. Sampling methods such as smart model parameterizations, forward surrogates, and parallel computing are better suited for such problems. We applied these methods to several important biomedical problems, such as phenotype prediction and a problem related to predicting the effects of protein mutations, i.e., if a given single residue mutation is neutral or deleterious, causing a disease. We also applied these methods to de novo drug discovery and drug repositioning (repurposing) through the enhanced exploration of huge chemical space. The purpose of these novel methods that address the problem of noise and uncertainty in biomedical data is to find new therapeutic solutions, perform drug repurposing, and accelerate and optimize drug discovery, thus reestablishing homeostasis. Finding the right target, the right compound, and the right patient are the three bottlenecks to running successful clinical trials from the correct analysis of preclinical models. Artificial intelligence can provide a solution to these problems, considering that the character of the data restricts the quality of the prediction, as in any modeling procedure in data analysis. The use of simple and plain methodologies is crucial to tackling these important and challenging problems, particularly drug repositioning/repurposing in rare diseases.

Keywords: artificial intelligence; drug design; drug discovery; noise and uncertainty; phenotype prediction.

Publication types

  • Review

MeSH terms

  • Artificial Intelligence*
  • Drug Discovery / methods
  • Drug Repositioning* / methods
  • Phenotype
  • Uncertainty