Development and implementation of a dynamically updated big data intelligence platform from electronic health records for nasopharyngeal carcinoma research

Br J Radiol. 2019 Oct;92(1102):20190255. doi: 10.1259/bjr.20190255. Epub 2019 Aug 20.

Abstract

Objective: To develop a big data intelligence platform for secondary use of electronic health records (EHRs) data to facilitate research for nasopharyngeal cancer (NPC).

Methods: This project was launched in 2015 and carried out by the cooperation of an academic cancer centre and a technology company. Patients diagnosed with NPC at Sun Yat-sen University Cancer Centre since January 2008 were included in the platform. Standard data elements were established to defined 981 variables for the platform. For each patient, data from 13 EHRs systems were extracted, integrated, structurized and normalized. Eight functional modules were constructed for the platform to facilitate the investigators to identify eligible patients, establish research projects, conduct statistical analysis, track the follow-up, search literature, etc.

Results: From January 2008 to December 2018, 54,703 patients diagnosed with NPC were included. Of these patients, 39,058 (71.4%) were male, and 15,645 (28.6%) were female; median age was 47 (interquartile range, 39-55) years. Of 981 variables, 341 were obtained from data structurization and normalization, of which 68 were generated by interacting multiple data sources via well-defined logical rules. The average precision rate, recall rate and F-measure for 341 variables were 0.97 ± 0.024, 0.92 ± 0.030, and 0.94 ± 0.027 respectively. The platform is regularly updated every seven days to include new patients and add new data for existing patients. Up to now, eight big data-driven retrospective studies have been published from the platform.

Conclusion: Our big data intelligence platform demonstrates the feasibility of integrating EHRs data of routine healthcare, and offers an important perspective on real-world study of NPC. The continued efforts may be focus on data sharing among multiple hospitals and publicly releasing of data files.

Advances in knowledge: Our big data intelligence platform is the first disease-specific data platform for NPC research. It incorporates comprehensive EHRs data from routine healthcare, which can facilitate real-world study of NPC in risk stratification, decision-making and comorbidities management.

MeSH terms

  • Adolescent
  • Adult
  • Age Distribution
  • Aged
  • Big Data*
  • Biomedical Research
  • Child
  • Child, Preschool
  • Electronic Health Records*
  • Endemic Diseases
  • Feasibility Studies
  • Female
  • Humans
  • Male
  • Middle Aged
  • Nasopharyngeal Carcinoma*
  • Nasopharyngeal Neoplasms*
  • Sex Distribution
  • Young Adult