A comparison of classifiers for predicting the class color of fluorescent proteins

Roger Sá da Silva; Luis Fernando Marins; Daniela Volcan Almeida; Karina Dos Santos Machado; Adriano V Werhli

doi:10.1016/j.compbiolchem.2019.107089

A comparison of classifiers for predicting the class color of fluorescent proteins

Comput Biol Chem. 2019 Dec:83:107089. doi: 10.1016/j.compbiolchem.2019.107089. Epub 2019 Jul 9.

Authors

Roger Sá da Silva¹, Luis Fernando Marins², Daniela Volcan Almeida³, Karina Dos Santos Machado⁴, Adriano V Werhli⁵

Affiliations

¹ Universidade Federal do Rio Grande - FURG, Centro de Ciências Computacionais, PPGComp - Programa de Pós-Graduação em Computação, Av. Itália, km 08, Rio Grande, RS, Brazil. Electronic address: roger.silva@veranopolis.ifrs.edu.br.
² Universidade Federal do Rio Grande - FURG, Centro de Ciências Computacionais, PPGComp - Programa de Pós-Graduação em Computação, Av. Itália, km 08, Rio Grande, RS, Brazil. Electronic address: dqmluf@furg.br.
³ Universidade Federal do Rio Grande - FURG, Centro de Ciências Computacionais, PPGComp - Programa de Pós-Graduação em Computação, Av. Itália, km 08, Rio Grande, RS, Brazil. Electronic address: danivolcan@furg.br.
⁴ Universidade Federal do Rio Grande - FURG, Centro de Ciências Computacionais, PPGComp - Programa de Pós-Graduação em Computação, Av. Itália, km 08, Rio Grande, RS, Brazil. Electronic address: karina.machado@furg.br.
⁵ Universidade Federal do Rio Grande - FURG, Centro de Ciências Computacionais, PPGComp - Programa de Pós-Graduação em Computação, Av. Itália, km 08, Rio Grande, RS, Brazil. Electronic address: werhli@furg.br.

PMID: 31323386
DOI: 10.1016/j.compbiolchem.2019.107089

Abstract

Fluorescent proteins have been applied in a wide variety of fields ranging from basic science to industrial applications. Apart from the naturally occurring fluorescent proteins, there is a growing interest in genetically modified variants that emit light in a specific wavelength. Genetically modifying a protein is not an easy task, especially because the exchange of one residue by other has to achieve the desired property while maintaining protein stability. To help in the choice of residue exchange, computational methods are applied to predict function and stability of proteins. In this work we have prepared a dataset composed by 109 fluorescent proteins and tested four classical supervised classification algorithms: artificial neural networks (ANNs), decision trees (DTs), support vector machines (SVMs) and random forests (RFs). This is the first time that algorithms are compared in this task. Results of comparing the algorithm's performance shows that DT, SVM and RF were significantly better than ANNs, and RF was the best method in all the scenarios. However, the interpretability of DTs is highly relevant and can provide important clues about the mechanisms involved in protein color emission. The results are promising and indicate that the use of in silico methods can greatly reduce the time and cost of the in vitro experiments.

Keywords: Classification; Data mining; Fluorescent proteins; Structural biology.

Publication types

Comparative Study

MeSH terms

Algorithms*
Color*
Decision Trees
Luminescent Proteins / chemistry*
Luminescent Proteins / metabolism
Neural Networks, Computer
Protein Stability
Support Vector Machine

Substances

Luminescent Proteins