Building a Lung and Ovarian Cancer Data Warehouse

Healthc Inform Res. 2020 Oct;26(4):303-310. doi: 10.4258/hir.2020.26.4.303. Epub 2020 Oct 31.

Abstract

Objectives: Despite the collection of vast amounts of data by the healthcare sector, effective decision-making in medical practice is still challenging. Data warehousing technology can be applied for the collection and management of clinical data from various sources to provide meaningful insights for physicians and administrators. Cancer data are extremely complicated and massive; hence, a clinical data warehouse system can provide insights into prevention, diagnosis and treatment processes through the use of online analytical processing tools for the analysis of multi-dimensional data at different granularity levels.

Methods: In this study, a clinical data warehouse was developed for lung cancer data, which were kindly provided by the United States National Cancer Institute. Lung and ovarian cancer data were imported in specific formats and cleaned to remove errors and redundancies. SQL server integration services (SSIS) were used for the extract-transform-load (ETL) process.

Results: The design of the clinical data warehouse responds efficiently to all types of queries by adopting the fact constellation schema model. Various online analytical processing queries can be expressed using the proposed approach.

Conclusions: This model succeeded in responding to complex queries, and the analysis of data is facilitated by using online analytical processing cubes and viewing multilevel data details.

Keywords: Data Analytics; Data Warehousing; Lung Cancer; Ovarian Cancer.