κ-NN for the classification of human cancer samples using the gene expression profiles

Manuel Martín-Merino

doi:10.1007/978-1-4419-5913-3_18

κ-NN for the classification of human cancer samples using the gene expression profiles

Adv Exp Med Biol. 2010:680:157-64. doi: 10.1007/978-1-4419-5913-3_18.

Author

Manuel Martín-Merino¹

Affiliation

¹ Computer Science Department, Universidad Pontificia de Salamanca, C/Compañía 5, 37002 Salamanca, Spain. mmartinmac@upsa.es

PMID: 20865497
DOI: 10.1007/978-1-4419-5913-3_18

Abstract

The [Formula: see text]-Nearest Neighbor (k-NN) classifier has been applied to the identification of cancer samples using the gene expression profiles with encouraging results. However, the performance of [Formula: see text]-NN depends strongly on the distance considered to evaluate the sample proximities. Besides, the choice of a good dissimilarity is a difficult task and depends on the problem at hand. In this chapter, we introduce a method to learn the metric from the data to improve the [Formula: see text]-NN classifier. To this aim, we consider a regularized version of the kernel alignment algorithm that incorporates a term that penalizes the complexity of the family of distances avoiding overfitting. The error function is optimized using a semidefinite programming approach (SDP). The method proposed has been applied to the challenging problem of cancer identification using the gene expression profiles. Kernel alignment [Formula: see text]-NN outperforms other metric learning strategies and improves the classical [Formula: see text]-NN algorithm.

Publication types

Evaluation Study

MeSH terms

Algorithms
Artificial Intelligence
Computational Biology
Gene Expression Profiling / statistics & numerical data*
Humans
Lymphoma, Large B-Cell, Diffuse / classification
Lymphoma, Large B-Cell, Diffuse / genetics
Neoplasms / classification*
Neoplasms / genetics*
Oligonucleotide Array Sequence Analysis / statistics & numerical data