Development of a data utility framework to support effective health data curation

BMJ Health Care Inform. 2021 May;28(1):e100303. doi: 10.1136/bmjhci-2020-100303.

Abstract

Objectives: The value of healthcare data is being increasingly recognised, including the need to improve health dataset utility. There is no established mechanism for evaluating healthcare dataset utility making it difficult to evaluate the effectiveness of activities improving the data. To describe the method for generating and involving the user community in developing a proposed framework for evaluation and communication of healthcare dataset utility for given research areas.

Methods: Aninitial version of a matrix to review datasets across a range of dimensions wasdeveloped based on previous published findings regarding healthcare data. Thiswas used to initiate a design process through interviews and surveys with datausers representing a broad range of user types and use cases, to help develop afocused framework for characterising datasets.

Results: Following 21 interviews, 31 survey responses and testing on 43 datasets, five major categories and 13 subcategories were identified as useful for a dataset, including Data Model, Completeness and Linkage. Each sub-category was graded to facilitate rapid and reproducible evaluation of dataset utility for specific use-cases. Testing of applicability to >40 existing datasets demonstrated potential usefulness for subsequent evaluation in real-world practice.

Discussion: Theresearch has developed an evidenced-based initial approach for a framework tounderstand the utility of a healthcare dataset. It likely to require further refinementfollowing wider application and additional categories may be required.

Conclusion: The process has resulted in a user-centred designed framework for objectively evaluating the likely utility of specific healthcare datasets, and therefore, should be of value both for potential users of health data, and for data custodians to identify the areas to provide the optimal value for data curation investment.

Keywords: BMJ health informatics; health care sector; information management; information science; information systems.

MeSH terms

  • Artificial Intelligence
  • Data Curation
  • Delivery of Health Care / organization & administration*
  • Drug Industry / organization & administration
  • Humans
  • Medical Informatics / organization & administration*
  • State Medicine / organization & administration
  • United Kingdom