An ontology-based approach to designing a NoSQL database for semi-structured and unstructured health data

Cluster Comput. 2023 Apr 8:1-18. doi: 10.1007/s10586-023-03995-y. Online ahead of print.

Abstract

With the advent of ICT-based healthcare applications, various formats of health data are generated every day in huge volume. Such data, consisting of unstructured, semi-structured and structured data, has every characteristic of Big data. NoSQL databases are generally preferred for storing such type of health data with the objective of improving query performance. However, for efficient retrieval and processing of Big Health Data and for resource optimization, suitable data models and design of the NoSQL databases are important requirements. Unlike relational databases, no standard methods or tools exist for NoSQL database design. In this work, we adopt an ontology-based schema design approach. We propose that an ontology, which captures the domain knowledge, be used for developing a health data model. An ontology for primary healthcare is described in this paper. We also propose an algorithm for designing the schema of a NoSQL database, keeping in mind the characteristics of the target NoSQL store, using a related ontology, a sample query set, some statistical information of the queries, and performance requirements of the query set. The ontology proposed by us for primary healthcare domain and the above mentioned algorithm along with a set of queries are used for generating a schema targeting MongoDB datastore. The performance of the proposed design is compared with a relational model developed for the same primary healthcare data and the effectiveness of our proposed approach is demonstrated. The entire experiment has been carried out on MongoDB cloud platform.

Keywords: Big data; Health Data; Health Ontology; NoSQL Data Model; Schema Design.