Structure-preserving integrated analysis for risk stratification with application to cancer staging

Biostatistics. 2022 Jul 18;23(3):990-1006. doi: 10.1093/biostatistics/kxab005.

Abstract

To provide appropriate and practical level of health care, it is critical to group patients into relatively few strata that have distinct prognosis. Such grouping or stratification is typically based on well-established risk factors and clinical outcomes. A well-known example is the American Joint Committee on Cancer staging for cancer that uses tumor size, node involvement, and metastasis status. We consider a statistical method for such grouping based on individual patient data from multiple studies. The method encourages a common grouping structure as a basis for borrowing information, but acknowledges data heterogeneity including unbalanced data structures across multiple studies. We build on the "lasso-tree" method that is more versatile than the well-known classification and regression tree method in generating possible grouping patterns. In addition, the parametrization of the lasso-tree method makes it very natural to incorporate the underlying order information in the risk factors. In this article, we also strengthen the lasso-tree method by establishing its theoretical properties for which Lin and others (2013. Lasso tree for cancer staging with survival data. Biostatistics 14, 327-339) did not pursue. We evaluate our method in extensive simulation studies and an analysis of multiple breast cancer data sets.

Keywords: Cancer staging; Data heterogeneity; Individual patient data; Integrated analysis; Survival analysis.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Breast Neoplasms*
  • Female
  • Humans
  • Neoplasm Staging
  • Prognosis
  • Regression Analysis
  • Risk Assessment