Computational advances of tumor marker selection and sample classification in cancer proteomics

Jing Tang; Yunxia Wang; Yongchao Luo; Jianbo Fu; Yang Zhang; Yi Li; Ziyu Xiao; Yan Lou; Yunqing Qiu; Feng Zhu

doi:10.1016/j.csbj.2020.07.009

Computational advances of tumor marker selection and sample classification in cancer proteomics

Comput Struct Biotechnol J. 2020 Jul 17:18:2012-2025. doi: 10.1016/j.csbj.2020.07.009. eCollection 2020.

Authors

Jing Tang^{1

2}, Yunxia Wang², Yongchao Luo², Jianbo Fu², Yang Zhang^{2

3}, Yi Li², Ziyu Xiao², Yan Lou⁴, Yunqing Qiu⁴, Feng Zhu^{1

2}

Affiliations

¹ Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.
² College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
³ School of Pharmaceutical Sciences and Innovative Drug Research Centre, Chongqing University, Chongqing 401331, China.
⁴ Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China.

Abstract

Cancer proteomics has become a powerful technique for characterizing the protein markers driving transformation of malignancy, tracing proteome variation triggered by therapeutics, and discovering the novel targets and drugs for the treatment of oncologic diseases. To facilitate cancer diagnosis/prognosis and accelerate drug target discovery, a variety of methods for tumor marker identification and sample classification have been developed and successfully applied to cancer proteomic studies. This review article describes the most recent advances in those various approaches together with their current applications in cancer-related studies. Firstly, a number of popular feature selection methods are overviewed with objective evaluation on their advantages and disadvantages. Secondly, these methods are grouped into three major classes based on their underlying algorithms. Finally, a variety of sample separation algorithms are discussed. This review provides a comprehensive overview of the advances on tumor maker identification and patients/samples/tissues separations, which could be guidance to the researches in cancer proteomics.

Keywords: ANN, Artificial Neural Network; ANOVA, Analysis of Variance; CFS, Correlation-based Feature Selection; Cancer proteomics; Computational methods; DAPC, Discriminant Analysis of Principal Component; DT, Decision Trees; EDA, Estimation of Distribution Algorithm; FC, Fold Change; GA, Genetic Algorithms; GR, Gain Ratio; HC, Hill Climbing; HCA, Hierarchical Cluster Analysis; IG, Information Gain; LDA, Linear Discriminant Analysis; LIMMA, Linear Models for Microarray Data; MBF, Markov Blanket Filter; MWW, Mann–Whitney–Wilcoxon test; OPLS-DA, Orthogonal Partial Least Squares Discriminant Analysis; PCA, Principal Component Analysis; PLS-DA, Partial Least Square Discriminant Analysis; RF, Random Forest; RF-RFE, Random Forest with Recursive Feature Elimination; SA, Simulated Annealing; SAM, Significance Analysis of Microarrays; SBE, Sequential Backward Elimination; SFS, and Sequential Forward Selection; SOM, Self-organizing Map; SU, Symmetrical Uncertainty; SVM, Support Vector Machine; SVM-RFE, Support Vector Machine with Recursive Feature Elimination; Sample classification; Tumor marker selection; sPLSDA, Sparse Partial Least Squares Discriminant Analysis; t-SNE, Student t Distribution; χ2, Chi-square.

Publication types

Review