Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories

Sriram Chockalingam; Maneesha Aluru; Srinivas Aluru

doi:10.3390/microarrays5030023

Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories

Microarrays (Basel). 2016 Sep 19;5(3):23. doi: 10.3390/microarrays5030023.

Authors

Sriram Chockalingam¹, Maneesha Aluru², Srinivas Aluru³

Affiliations

¹ Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai 40076, India. sriram.pc@iitb.ac.in.
² School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA. maneesha.aluru@biology.gatech.edu.
³ School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA. aluru@cc.gatech.edu.

Abstract

Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

Keywords: Arabidopsis thaliana; gene networks; microarray.