A gradient boosting tree model for multi-department venous thromboembolism risk assessment with imbalanced data

J Biomed Inform. 2022 Oct:134:104210. doi: 10.1016/j.jbi.2022.104210. Epub 2022 Sep 16.

Abstract

Venous thromboembolism (VTE) is the world's third most common cause of vascular mortality and a serious complication from multiple departments. Risk assessment of VTE guides clinical intervention in time and is of great importance to in-hospital patients. Traditional VTE risk assessment methods based on scaling tools, which always require rules carefully designed by human experts, are difficult to apply to large-population scenarios since the manually designed rules are not guaranteed to be accurate to all populations. In contrast, with the development of the electronic health record (EHR) datasets, data-driven machine-learning-based risk assessment methods have proven superior predictability in many studies in recent years. This paper uses the gradient boosting tree model to study the VTE risk assessment problem with multi-department data. There exist two distinct characteristics of VTE data collected at the level of the entire hospital: its wide distribution and heterogeneity across multiple departments. To this end, we consider the prediction task over multiple departments as a multi-task learning process, and introduce the algorithm of a task-aware tree-based method TSGB to tackle the multi-task prediction problem. Although the introduction of multi-task learning improves overall across-department performance, we reveal the problem of task-wise performance decline while dealing with imbalanced VTE data volume. According to the analysis, we finally propose two variants of TSGB to alleviate the problems and further boost the prediction performance. Compared with state-of-the-art rule-based and multi-task tree-based methods, the experimental results show the proposed methods not only improve the overall across-department AUC performance effectively, but also ensure the improvement of performance over every single department prediction.

Keywords: Multi-task learning; Risk assessment model; Venous thromboembolism (VTE).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Electronic Health Records
  • Hospitals
  • Humans
  • Risk Assessment / methods
  • Risk Factors
  • Venous Thromboembolism* / diagnosis
  • Venous Thromboembolism* / etiology