Added predictive value of omics data: specific issues related to validation illustrated by two case studies

BMC Med Res Methodol. 2014 Oct 28:14:117. doi: 10.1186/1471-2288-14-117.

Abstract

Background: In the last years, the importance of independent validation of the prediction ability of a new gene signature has been largely recognized. Recently, with the development of gene signatures which integrate rather than replace the clinical predictors in the prediction rule, the focus has been moved to the validation of the added predictive value of a gene signature, i.e. to the verification that the inclusion of the new gene signature in a prediction model is able to improve its prediction ability.

Methods: The high-dimensional nature of the data from which a new signature is derived raises challenging issues and necessitates the modification of classical methods to adapt them to this framework. Here we show how to validate the added predictive value of a signature derived from high-dimensional data and critically discuss the impact of the choice of methods on the results.

Results: The analysis of the added predictive value of two gene signatures developed in two recent studies on the survival of leukemia patients allows us to illustrate and empirically compare different validation techniques in the high-dimensional framework.

Conclusions: The issues related to the high-dimensional nature of the omics predictors space affect the validation process. An analysis procedure based on repeated cross-validation is suggested.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Age Factors
  • Aged
  • Aged, 80 and over
  • Female
  • Genomics
  • Humans
  • Immunoglobulin Variable Region / genetics
  • Leukemia, Lymphocytic, Chronic, B-Cell / diagnosis
  • Leukemia, Lymphocytic, Chronic, B-Cell / genetics*
  • Leukemia, Lymphocytic, Chronic, B-Cell / mortality*
  • Leukemia, Myeloid, Acute / diagnosis
  • Leukemia, Myeloid, Acute / genetics*
  • Leukemia, Myeloid, Acute / mortality*
  • Male
  • Middle Aged
  • Models, Statistical
  • Nuclear Proteins / genetics
  • Nucleophosmin
  • Prognosis
  • Proportional Hazards Models
  • Survival Rate
  • Tandem Repeat Sequences / genetics
  • Validation Studies as Topic
  • Young Adult
  • fms-Like Tyrosine Kinase 3 / genetics

Substances

  • Immunoglobulin Variable Region
  • Nuclear Proteins
  • Nucleophosmin
  • FLT3 protein, human
  • fms-Like Tyrosine Kinase 3