A comparison of classifiers for predicting the class color of fluorescent proteins

Comput Biol Chem. 2019 Dec:83:107089. doi: 10.1016/j.compbiolchem.2019.107089. Epub 2019 Jul 9.

Abstract

Fluorescent proteins have been applied in a wide variety of fields ranging from basic science to industrial applications. Apart from the naturally occurring fluorescent proteins, there is a growing interest in genetically modified variants that emit light in a specific wavelength. Genetically modifying a protein is not an easy task, especially because the exchange of one residue by other has to achieve the desired property while maintaining protein stability. To help in the choice of residue exchange, computational methods are applied to predict function and stability of proteins. In this work we have prepared a dataset composed by 109 fluorescent proteins and tested four classical supervised classification algorithms: artificial neural networks (ANNs), decision trees (DTs), support vector machines (SVMs) and random forests (RFs). This is the first time that algorithms are compared in this task. Results of comparing the algorithm's performance shows that DT, SVM and RF were significantly better than ANNs, and RF was the best method in all the scenarios. However, the interpretability of DTs is highly relevant and can provide important clues about the mechanisms involved in protein color emission. The results are promising and indicate that the use of in silico methods can greatly reduce the time and cost of the in vitro experiments.

Keywords: Classification; Data mining; Fluorescent proteins; Structural biology.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Color*
  • Decision Trees
  • Luminescent Proteins / chemistry*
  • Luminescent Proteins / metabolism
  • Neural Networks, Computer
  • Protein Stability
  • Support Vector Machine

Substances

  • Luminescent Proteins