ChemFLuo: a web-server for structure analysis and identification of fluorescent compounds

Brief Bioinform. 2021 Jul 20;22(4):bbaa282. doi: 10.1093/bib/bbaa282.

Abstract

Background: Fluorescent detection methods are indispensable tools for chemical biology. However, the frequent appearance of potential fluorescent compound has greatly interfered with the recognition of compounds with genuine activity. Such fluorescence interference is especially difficult to identify as it is reproducible and possesses concentration-dependent characteristic. Therefore, the development of a credible screening tool to detect fluorescent compounds from chemical libraries is urgently needed in early stages of drug discovery.

Results: In this study, we developed a webserver ChemFLuo for fluorescent compound detection, based on two large and high-quality training datasets containing 4906 blue and 8632 green fluorescent compounds. These molecules were used to construct a group of prediction models based on the combination of three machine learning algorithms and seven types of molecular representations. The best blue fluorescence prediction model achieved with balanced accuracy (BA) = 0.858 and area under the receiver operating characteristic curve (AUC) = 0.931 for the validation set, and BA = 0.823 and AUC = 0.903 for the test set. The best green fluorescence prediction model achieved the prediction accuracy with BA = 0.810 and AUC = 0.887 for the validation set, and BA = 0.771 and AUC = 0.852 for the test set. Besides prediction model, 22 blue and 16 green representative fluorescent substructures were summarized for the screening of potential fluorescent compounds. The comparison with other fluorescence detection tools and theapplication to external validation sets and large molecule libraries have demonstrated the reliability of prediction model for fluorescent compound detection.

Conclusion: ChemFLuo is a public webserver to filter out compounds with undesirable fluorescent properties, which will benefit the design of high-quality chemical libraries for drug discovery. It is freely available at http://admet.scbdd.com/chemfluo/index/.

Keywords: false positives; fluorescent compounds; frequent hitters; machine learning; public webserver; substructure screening.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Drug Discovery*
  • Fluorescence
  • Fluorescent Dyes / chemistry*
  • Machine Learning*
  • Models, Chemical*
  • Small Molecule Libraries*

Substances

  • Fluorescent Dyes
  • Small Molecule Libraries