Defining Cancer Subtypes With Distinctive Etiologic Profiles: An Application to the Epidemiology of Melanoma

J Am Stat Assoc. 2017;112(517):54-63. doi: 10.1080/01621459.2016.1191499. Epub 2017 May 3.

Abstract

We showcase a novel analytic strategy to identify sub-types of cancer that possess distinctive causal factors, i.e. sub-types that are "etiologically" distinct. The method involves the integrated analysis of two types of study design: an incident series of cases with double primary cancers with detailed information on tumor characteristics that can be used to define the sub-types; a case-series of incident cases with information on known risk factors that can be used to investigate the specific risk factors that distinguish the sub-types. The methods are applied to a rich melanoma dataset with detailed information on pathologic tumor factors, and comprehensive information on known genetic and environmental risk factors for melanoma. Identification of the optimal sub-typing solution is accomplished using a novel clustering analysis that seeks to maximize a measure that characterizes the distinctiveness of the distributions of risk factors across the sub-types and that is a function of the correlations of tumor factors in the case-specific tumor pairs. This analysis is challenged by the presence of extensive missing data. If successful, studies of this nature offer the opportunity for efficient study design to identify unknown risk factors whose effects are concentrated in defined sub-types.

Keywords: case-control study; etiologic heterogeneity; k-means clustering; logistic regression; multiple imputation.

Publication types

  • Research Support, N.I.H., Extramural