Harnessing Population Pedigree Data and Machine Learning Methods to Identify Patterns of Familial Bladder Cancer Risk

Cancer Epidemiol Biomarkers Prev. 2020 May;29(5):918-926. doi: 10.1158/1055-9965.EPI-19-0681. Epub 2020 Feb 25.

Abstract

Background: Relatives of patients with bladder cancer have been shown to be at increased risk for kidney, lung, thyroid, and cervical cancer after correcting for smoking-related behaviors that may concentrate in some families. We demonstrate a novel approach to simultaneously assess risks for multiple cancers to identify distinct multicancer configurations (multiple different cancer types that cluster in relatives) surrounding patients with familial bladder cancer.

Methods: This study takes advantage of a unique population-level data resource, the Utah Population Database (UPDB), containing vast genealogy and statewide cancer data. Familial risk is measured using standardized incidence risk (SIR) ratios that account for sex, age, birth cohort, and person-years of the pedigree members.

Results: We identify 1,023 families with a significantly higher bladder cancer rate than population controls (familial bladder cancer). Familial SIRs are then calculated across 25 cancer types, and a weighted Gower distance with K-medoids clustering is used to identify familial multicancer configurations (FMC). We found five FMCs, each exhibiting a different pattern of cancer aggregation. Of the 25 cancer types studied, kidney and prostate cancers were most commonly enriched in the familial bladder cancer clusters. Laryngeal, lung, stomach, acute lymphocytic leukemia, Hodgkin disease, soft-tissue carcinoma, esophageal, breast, lung, uterine, thyroid, and melanoma cancers were the other cancer types with increased incidence in familial bladder cancer families.

Conclusions: This study identified five familial bladder cancer FMCs showing unique risk patterns for cancers of other organs, suggesting phenotypic heterogeneity familial bladder cancer.

Impact: FMC configurations could permit better definitions of cancer phenotypes (subtypes or multicancer) for gene discovery and environmental risk factor studies.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Data Collection / methods*
  • Data Mining / methods
  • Databases, Factual / statistics & numerical data
  • Female
  • Genetic Heterogeneity
  • Genetic Predisposition to Disease
  • Humans
  • Incidence
  • Machine Learning*
  • Male
  • Middle Aged
  • Neoplastic Syndromes, Hereditary / epidemiology*
  • Neoplastic Syndromes, Hereditary / genetics
  • Pedigree
  • Risk Assessment / methods
  • Risk Factors
  • Urinary Bladder Neoplasms / epidemiology*
  • Urinary Bladder Neoplasms / genetics
  • Utah / epidemiology
  • Young Adult