[Development and validation of risk prediction model for new-onset cardiovascular diseases among breast cancer patients: Based on regional medical data of Inner Mongolia]

Beijing Da Xue Xue Bao Yi Xue Ban. 2023 Jun 18;55(3):471-479. doi: 10.19723/j.issn.1671-167X.2023.03.013.
[Article in Chinese]

Abstract

Objective: To develop and validate a three-year risk prediction model for new-onset cardiovascular diseases (CVD) among female patients with breast cancer.

Methods: Based on the data from Inner Mongolia Regional Healthcare Information Platform, female breast cancer patients over 18 years old who had received anti-tumor treatments were included. The candidate predictors were selected by Lasso regression after being included according to the results of the multivariate Fine & Gray model. Cox proportional hazard model, Logistic regression model, Fine & Gray model, random forest model, and XGBoost model were trained on the training set, and the model performance was evaluated on the testing set. The discrimination was evaluated by the area under the curve (AUC) of the receiver operator characteristic curve (ROC), and the calibration was evaluated by the calibration curve.

Results: A total of 19 325 breast cancer patients were identified, with an average age of (52.76±10.44) years. The median follow-up was 1.18 [interquartile range (IQR): 2.71] years. In the study, 7 856 patients (40.65%) developed CVD within 3 years after the diagnosis of breast cancer. The final selected variables included age at diagnosis of breast cancer, gross domestic product (GDP) of residence, tumor stage, history of hypertension, ischemic heart disease, and cerebrovascular disease, type of surgery, type of chemotherapy and radiotherapy. In terms of model discrimination, when not considering survival time, the AUC of the XGBoost model was significantly higher than that of the random forest model [0.660 (95%CI: 0.644-0.675) vs. 0.608 (95%CI: 0.591-0.624), P < 0.001] and Logistic regression model [0.609 (95%CI: 0.593-0.625), P < 0.001]. The Logistic regression model and the XGBoost model showed better calibration. When considering survival time, Cox proportional hazard model and Fine & Gray model showed no significant difference for AUC [0.600 (95%CI: 0.584-0.616) vs. 0.615 (95%CI: 0.599-0.631), P=0.188], but Fine & Gray model showed better calibration.

Conclusion: It is feasible to develop a risk prediction model for new-onset CVD of breast cancer based on regional medical data in China. When not considering survival time, the XGBoost model and the Logistic regression model both showed better performance; Fine & Gray model showed better performance in consideration of survival time.

目的: 开发和验证乳腺癌患者新发心血管疾病(cardiovascular disease, CVD)的3年预测模型。

方法: 基于内蒙古区域医疗数据,纳入接受抗肿瘤治疗的18岁以上乳腺癌女性患者。多因素Fine & Gray模型纳入预测因子后,使用Lasso回归筛选变量,在训练集上拟合Cox比例风险、Logistic回归、Fine & Gray、随机森林和XGBoost模型,在测试集上分别用受试者工作特征(receiver operating characteristics, ROC)曲线下面积(area under the curve, AUC)和校准曲线评价模型区分度和校准度。

结果: 共纳入19 325例接受抗肿瘤治疗的乳腺癌患者,平均年龄(52.76±10.44)岁,中位随访时间1.18年[四分位距(interquartile range, IQR):2.71]。7 856例患者(40.65%)在乳腺癌诊断3年内发生CVD。Lasso回归筛选的预测因子为乳腺癌诊断年龄、居住地国内生产总值(gross domestic product,GDP)、肿瘤分期、高血压、缺血性心脏病及脑血管疾病既往史、手术类型、化疗类型、放疗类型。不考虑生存时间时,XGBoost模型的AUC显著高于随机森林模型[0.660 (95%CI:0.644~0.675) vs. 0.608 (95%CI:0.591~0.624), P < 0.001]和Logistic回归[0.609 (95%CI:0.593~0.625), P < 0.001],Logistic回归和XGBoost模型的校准度更好。考虑生存时间时,Cox比例风险模型和Fine & Gray模型的AUC差异无统计学意义[0.600 (95%CI:0.584~0.616) vs. 0.615 (95%CI:0.599~0.631), P=0.188],但Fine & Gray模型的校准度更好。

结论: 基于区域医疗数据建立乳腺癌新发CVD的预测模型具有可行性。不考虑生存时间时,Logistic回归和XGBoost模型的预测性能更好;考虑生存时间时,Fine & Gray模型的预测性能更好。

Keywords: Breast neoplasms; Cardiovascular disease; Computerized medical records systems; Risk assessment; Risk prediction model.

Publication types

  • English Abstract

MeSH terms

  • Adolescent
  • Adult
  • Breast Neoplasms* / epidemiology
  • Cardiovascular Diseases* / epidemiology
  • Cardiovascular Diseases* / etiology
  • China / epidemiology
  • Female
  • Humans
  • Logistic Models
  • Middle Aged
  • Proportional Hazards Models

Grants and funding

国家自然科学基金(82173616)