Challenges to Using Big Data in Cancer

Cancer Res. 2023 Apr 14;83(8):1175-1182. doi: 10.1158/0008-5472.CAN-22-1274.

Abstract

Big data in healthcare can enable unprecedented understanding of diseases and their treatment, particularly in oncology. These data may include electronic health records, medical imaging, genomic sequencing, payor records, and data from pharmaceutical research, wearables, and medical devices. The ability to combine datasets and use data across many analyses is critical to the successful use of big data and is a concern for those who generate and use the data. Interoperability and data quality continue to be major challenges when working with different healthcare datasets. Mapping terminology across datasets, missing and incorrect data, and varying data structures make combining data an onerous and largely manual undertaking. Data privacy is another concern addressed by the Health Insurance Portability and Accountability Act, the Common Rule, and the General Data Protection Regulation. The use of big data is now included in the planning and activities of the FDA and the European Medicines Agency. The willingness of organizations to share data in a precompetitive fashion, agreements on data quality standards, and institution of universal and practical tenets on data privacy will be crucial to fully realizing the potential for big data in medicine.

Publication types

  • Review

MeSH terms

  • Big Data*
  • Humans
  • Information Storage and Retrieval
  • Neoplasms* / diagnosis
  • Neoplasms* / therapy
  • Precision Medicine