Verification of gene expression profiles for colorectal cancer using 12 internet public microarray datasets

World J Gastroenterol. 2014 Dec 14;20(46):17476-82. doi: 10.3748/wjg.v20.i46.17476.

Abstract

Aim: To verify gene expression profiles for colorectal cancer using 12 internet public microarray datasets.

Methods: Logistic regression analysis was performed, and odds ratios for each gene were determined between colorectal cancer (CRC) and controls. Twelve public microarray datasets of GSE 4107, 4183, 8671, 9348, 10961, 13067, 13294, 13471, 14333, 15960, 17538, and 18105, which included 519 cases of adenocarcinoma and 88 normal mucosa controls, were pooled and used to verify 17 selective genes from 3 published studies and estimate the external generality.

Results: We validated the 17 CRC-associated genes from studies by Chang et al (Model 1: 5 genes), Marshall et al (Model 2: 7 genes) and Han et al (Model 3: 5 genes) and performed the multivariate logistic regression analysis using the pooled 12 public microarray datasets as well as the external validation. The goodness-of-fit test of Hosmer-Lemeshow (H-L) showed statistical significance (P = 0.044) for Model 2 of Marshall et al in which observed event rates did not match expected event rates in subgroups of the model population. Expected and observed event rates in subgroups were similar, which are called well calibrated, in Models 1, 3 and 4 with non-significant P values of 0.460, 0.194 and 1.000 for H-L tests, respectively. A 7-gene model of CPEB4, EIF2S3, MGC20553, MS4A1, ANXA3, TNFAIP6 and IL2RB was pairwise selected, which showed the best results in logistic regression analysis (H-L P = 1.000, R (2) = 0.951, areas under the curve = 0.999, accuracy = 0.968, specificity = 0.966 and sensitivity = 0.994).

Conclusion: A novel gene expression profile was associated with CRC and can potentially be applied to blood-based detection assays.

Keywords: Colorectal cancer; Gene Expression Omnibus; Gene Expression Omnibus series; Gene expression profiles; Microarray.

Publication types

  • Validation Study

MeSH terms

  • Biomarkers, Tumor / genetics*
  • Case-Control Studies
  • Colorectal Neoplasms / genetics*
  • Databases, Genetic*
  • Gene Expression Profiling / methods*
  • Genetic Predisposition to Disease
  • Humans
  • Internet*
  • Logistic Models
  • Multivariate Analysis
  • Odds Ratio
  • Oligonucleotide Array Sequence Analysis*
  • Predictive Value of Tests
  • Reproducibility of Results
  • Risk Factors

Substances

  • Biomarkers, Tumor