Classification by integrating plant stress response gene expression data with biological knowledge

Math Biosci. 2015 Aug:266:65-72. doi: 10.1016/j.mbs.2015.06.005. Epub 2015 Jun 17.

Abstract

Classification of microarray data has always been a challenging task because of the enormous number of genes. In this study, a clustering method by integrating plant stress response gene expression data with biological knowledge is presented. Clustering is one of the promising tools for attribute reduction, but gene clusters are biologically uninformative. So we integrated biological knowledge into genomic analysis to help to improve the interpretation of the results. Biological similarity based on gene ontology (GO) semantic similarity was combined with gene expression data to find out biologically meaningful clusters. Affinity propagation clustering algorithm was chosen to analyze the impact of the biological similarity on the results. Based on clustering result, neighborhood rough set was used to select representative genes for each cluster. The prediction accuracy of classifiers built on reduced gene subsets indicated that our approach outperformed other classical methods. The information fusion was proven to be effective through quantitative analysis, as it could select gene subsets with high biological significance and select significant genes.

Keywords: Biological knowledge; Information fusion; Neighborhood rough set; Plant stress response.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression*
  • Models, Genetic*
  • Oligonucleotide Array Sequence Analysis*
  • Plant Physiological Phenomena*
  • Stress, Physiological*