A novel non-parametric method for uncertainty evaluation of correlation-based molecular signatures: its application on PAM50 algorithm

Bioinformatics. 2017 Mar 1;33(5):693-700. doi: 10.1093/bioinformatics/btw704.

Abstract

Motivation: The PAM50 classifier is used to assign patients to the highest correlated breast cancer subtype irrespectively of the obtained value. Nonetheless, all subtype correlations are required to build the risk of recurrence (ROR) score, currently used in therapeutic decisions. Present subtype uncertainty estimations are not accurate, seldom considered or require a population-based approach for this context.

Results: Here we present a novel single-subject non-parametric uncertainty estimation based on PAM50's gene label permutations. Simulations results ( n = 5228) showed that only 61% subjects can be reliably 'Assigned' to the PAM50 subtype, whereas 33% should be 'Not Assigned' (NA), leaving the rest to tight 'Ambiguous' correlations between subtypes. The NA subjects exclusion from the analysis improved survival subtype curves discrimination yielding a higher proportion of low and high ROR values. Conversely, all NA subjects showed similar survival behaviour regardless of the original PAM50 assignment. We propose to incorporate our PAM50 uncertainty estimation to support therapeutic decisions.

Availability and implementation: Source code can be found in 'pbcmc' R package at Bioconductor.

Contacts: cristobalfresno@gmail.com or efernandez@bdmg.com.ar.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Breast Neoplasms / diagnosis*
  • Computational Biology / methods*
  • Female
  • Humans
  • Neoplasm Recurrence, Local*
  • Prognosis
  • Risk
  • Uncertainty*