The aim of this study is to diagnose the stage of renal cell carcinoma and to predict the prognosis of breast cancer by using RNA sequencing and microarray data that are representative gene expression data. To identify biomarkers for prediction, top-N genes of each class of cancer or noncancer are recommended by collaborative filtering method based on three gene similarity coefficients. We then construct a machine learning model for classification using the union of the recommended genes as the final feature set. The optimal genetic markers were used to identify the set with the highest classification performance in the model. Experiments conducted by the proposed method showed higher performance than those conducted by the machine learning model using all the gene features without performing feature selection. In addition, it showed better performance than other studies based on existing correlation-based feature selection.
Keywords: cancer diagnosis; collaborative filtering; feature selection; machine learning.