A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis

Hua Chai; Zi-Na Li; De-Yu Meng; Liang-Yong Xia; Yong Liang

doi:10.1038/s41598-017-13133-5

A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis

Sci Rep. 2017 Oct 12;7(1):13053. doi: 10.1038/s41598-017-13133-5.

Authors

Hua Chai¹, Zi-Na Li², De-Yu Meng², Liang-Yong Xia¹, Yong Liang³

Affiliations

¹ Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long,Taipa, Macau, 999078, China.
² Institute for Information and System Sciences and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an Shaan'xi, 710049, China.
³ Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long,Taipa, Macau, 999078, China. yliang@must.edu.mo.

Abstract

Gene selection is an attractive and important task in cancer survival analysis. Most existing supervised learning methods can only use the labeled biological data, while the censored data (weakly labeled data) far more than the labeled data are ignored in model building. Trying to utilize such information in the censored data, a semi-supervised learning framework (Cox-AFT model) combined with Cox proportional hazard (Cox) and accelerated failure time (AFT) model was used in cancer research, which has better performance than the single Cox or AFT model. This method, however, is easily affected by noise. To alleviate this problem, in this paper we combine the Cox-AFT model with self-paced learning (SPL) method to more effectively employ the information in the censored data in a self-learning way. SPL is a kind of reliable and stable learning mechanism, which is recently proposed for simulating the human learning process to help the AFT model automatically identify and include samples of high confidence into training, minimizing interference from high noise. Utilizing the SPL method produces two direct advantages: (1) The utilization of censored data is further promoted; (2) the noise delivered to the model is greatly decreased. The experimental results demonstrate the effectiveness of the proposed model compared to the traditional Cox-AFT model.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Humans
Neoplasms / mortality*
Proportional Hazards Models
Supervised Machine Learning*
Survival Analysis*