Developing a computable phenotype for glioblastoma

Sandra Yan; Kaitlyn Melnick; Xing He; Tianchen Lyu; Rachel S F Moor; Megan E H Still; Duane A Mitchell; Elizabeth A Shenkman; Han Wang; Yi Guo; Jiang Bian; Ashley P Ghiaseddin

doi:10.1093/neuonc/noad249

Developing a computable phenotype for glioblastoma

Neuro Oncol. 2023 Dec 23:noad249. doi: 10.1093/neuonc/noad249. Online ahead of print.

Authors

Sandra Yan¹, Kaitlyn Melnick¹, Xing He^{2

3}, Tianchen Lyu^{2

3}, Rachel S F Moor¹, Megan E H Still¹, Duane A Mitchell¹, Elizabeth A Shenkman^{2

3}, Han Wang², Yi Guo^{2

3}, Jiang Bian^{2

3}, Ashley P Ghiaseddin¹

Affiliations

¹ Department of Neurosurgery, College of Medicine, University of Florida, FL, USA.
² Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, FL, USA.
³ Cancer Informatics Shared Resource, University of Florida Health Cancer Center, FL, USA.

PMID: 38141226
DOI: 10.1093/neuonc/noad249

Abstract

Background: Glioblastoma (GBM) is the most common malignant brain tumor, and thus it is important to be able to identify patients with this diagnosis for population studies. However, this can be challenging as diagnostic codes are non-specific. The aim of this study was to create a computable phenotype (CP) for GBM from structured and unstructured data to identify patients with this condition in a large electronic health record (EHR).

Methods: We used the UF Health Integrated Data Repository, a centralized clinical data warehouse that stores clinical and research data from various sources within the UF Health system, including the EHR system. We performed multiple iterations to refine the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient data. We then evaluated the performances of various possible proposed CPs constructed from the relevant codes and keywords.

Results: We underwent six rounds of manual chart reviews to refine the CP elements. The final CP algorithm for identifying GBM patients was selected based on the best F1-score. Overall, the CP rule "if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword" demonstrated the highest F1-score using both structured and unstructured data. Thus, it was selected as the best-performing CP rule.

Conclusions: We developed a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance which minimizes possible biases from misclassification errors.

Keywords: Computable phenotype; Electronic Health Records (EHRs); Glioblastoma; Structured data; Unstructured data.