An integrated approach (CLuster Analysis Integration Method) to combine expression data and protein-protein interaction networks in agrigenomics: application on Arabidopsis thaliana

OMICS. 2014 Feb;18(2):155-65. doi: 10.1089/omi.2013.0050. Epub 2014 Jan 3.

Abstract

Experimental co-expression data and protein-protein interaction networks are frequently used to analyze the interactions among genes or proteins. Recent studies have investigated methods to integrate these two sources of information. We propose a new method to integrate co-expression data obtained through DNA microarray analysis (MA) and protein-protein interaction (PPI) network data, and apply it to Arabidopsis thaliana. The proposed method identifies small subsets of highly interacting proteins. Based on the analysis of the basis of co-localization and mRNA developmental expression, we show that these groups provide important biological insights; additionally, these subsets are significantly enriched with respect to KEGG Pathways and can be used to predict successfully whether proteins belong to known pathways. Thus, the method is able to provide relevant biological information and support the functional identification of complex genetic traits of economic value in plant agrigenomics research. The method has been implemented in a prototype software tool named CLAIM (CLuster Analysis Integration Method) and can be downloaded from http://bio.cs.put.poznan.pl/research_fields . CLAIM is based on the separate clustering of MA and PPI data; the clusters are merged in a special graph; cliques of this graph are subsets of strongly connected proteins. The proposed method was successfully compared with existing methods. CLAIM appears to be a useful semi-automated tool for protein functional analysis and warrants further evaluation in agrigenomics research.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Arabidopsis / genetics*
  • Arabidopsis / metabolism
  • Arabidopsis Proteins / genetics*
  • Arabidopsis Proteins / metabolism
  • Gene Expression Regulation, Plant*
  • Gene Regulatory Networks*
  • Genome, Plant*
  • Molecular Sequence Annotation
  • Multigene Family
  • Pattern Recognition, Automated
  • Protein Interaction Mapping / methods
  • Software*

Substances

  • Arabidopsis Proteins