Construction and application of nasopharyngeal carcinoma-specific big data platform based on electronic health records

Am J Otolaryngol. 2024 May-Jun;45(3):104204. doi: 10.1016/j.amjoto.2023.104204. Epub 2023 Dec 19.

Abstract

Objective: To establish a nasopharyngeal carcinoma-specific big data platform based on electronic health records (EHRs) to provide data support for real-world study of nasopharyngeal carcinoma.

Methods: A multidisciplinary expert team was established for this project. Based on industry standards and practical feasibility, the team designed the nasopharyngeal carcinoma data element standards including 14 modules and 640 fields. Data from patients diagnosed with nasopharyngeal carcinoma who visited Southern Hospital after 1999 were extracted from 15 EHRs systems and were cleaned, structured, and standardized using information technologies such as machine learning and natural language processing. In addition, a series of measures such as quality control and data encryption were taken to ensure data quality and patient privacy. At the platform application level, 10 functional modules were designed according to the needs of nasopharyngeal carcinoma research.

Results: As of 1 October 2022, the Big Data platform has included 11,617patients, of whom 8228 (70.83 %) were male and 3389 (29.17 %) were female, with a median age of 48 years (interquartile range, 40 years). The data in the platform were validated to have a high level of completeness and accuracy, especially for key variables such as social demographics, laboratory tests and vital signs. Currently, six projects involving risk factors, early diagnosis, treatment efficacy and prevention of treatment-related toxic reactions have been conducted on the platform.

Conclusions: We have established a high-quality NPC-specific big data platform by integrating heterogeneous data from multiple sources in the EHR. The platform provides an effective tool and strong data support for real-world studies of nasopharyngeal carcinoma, which helps to improve research efficiency, reduce costs, and improve the quality of research results. We expect to promote multicenter nasopharyngeal carcinoma data sharing in the future to facilitate the generation of high-quality real-world evidence in nasopharyngeal carcinoma. This article may provide some reference value for other comprehensive hospitals to establish a big data platform for nasopharyngeal carcinoma.

Keywords: Big data platform; Electronic health record; Nasopharyngeal carcinoma; Real-world study.

MeSH terms

  • Adult
  • Big Data*
  • Electronic Health Records*
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Nasopharyngeal Carcinoma* / diagnosis
  • Nasopharyngeal Carcinoma* / therapy
  • Nasopharyngeal Neoplasms*
  • Natural Language Processing