Cleaning the Medicago Microarray Database to Improve Gene Function Analysis

Plants (Basel). 2021 Jun 18;10(6):1240. doi: 10.3390/plants10061240.

Abstract

Transcriptomics studies have been facilitated by the development of microarray and RNA-Seq technologies, with thousands of expression datasets available for many species. However, the quality of data can be highly variable, making the combined analysis of different datasets difficult and unreliable. Most of the microarray data for Medicago truncatula, the barrel medic, have been stored and made publicly accessible on the web database Medicago truncatula Gene Expression atlas (MtGEA). The aim of this work is to ameliorate the quality of the MtGEA database through a general method based on logical and statistical relationships among parameters and conditions. The initial 716 columns available in the dataset were reduced to 607 by evaluating the quality of data through the sum of the expression levels over the entire transcriptome probes and Pearson correlation among hybridizations. The reduced dataset shows great improvements in the consistency of the data, with a reduction in both false positives and false negatives resulting from Pearson correlation and GO enrichment analysis among genes. The approach we used is of general validity and our intent is to extend the analysis to other plant microarray databases.

Keywords: Medicago; MtGEA; R programming; correlation analysis; functional genomics; microarray; transcriptomics.