RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule

Lei Zheng; Shenghui Huang; Nengjiang Mu; Haoyue Zhang; Jiayu Zhang; Yu Chang; Lei Yang; Yongchun Zuo

doi:10.1093/database/baz131

RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule

Database (Oxford). 2019 Jan 1:2019:baz131. doi: 10.1093/database/baz131.

Authors

Lei Zheng¹, Shenghui Huang¹, Nengjiang Mu¹, Haoyue Zhang¹, Jiayu Zhang¹, Yu Chang¹, Lei Yang², Yongchun Zuo¹

Affiliations

¹ State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China.
² College of Bioinformatics Science and Technology, Harbin Medical University, Baojian Road No.157, Harbin 150081, China.

Abstract

By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Sequence
Amino Acids / chemistry*
Databases, Protein
Internet*
Machine Learning
Models, Biological*
Protein Structure, Secondary

Substances

Amino Acids