Does big data require a methodological change in medical research?

BMC Med Res Methodol. 2019 Jun 17;19(1):125. doi: 10.1186/s12874-019-0774-0.

Abstract

Background: Use of big data is becoming increasingly popular in medical research. Since big data-based projects differ notably from classical research studies, both in terms of scope and quality, a debate is apt as to whether big data require new approaches to scientific reasoning different from those established in statistics and philosophy of science.

Main text: The progressing digitalization of our societies generates vast amounts of data that also become available for medical research. Here, the big promise of big data is to facilitate major improvements in the treatment, diagnosis and prevention of diseases. An ongoing examination of the idiosyncrasies of big data is therefore essential to ensure that the field stays congruent with the principles of evidence-based medicine. We discuss the inherent challenges and opportunities of big data in medicine from a methodological point of view, particularly highlighting the relative importance of causality and correlation in commercial and medical research settings. We make a strong case for upholding the distinction between exploratory data analysis facilitating hypothesis generation and confirmatory approaches involving hypothesis validation. An independent verification of research results will be ever more important in the context of big data, where data quality is often hampered by a lack of standardization and structuring.

Conclusions: We argue that it would be both unnecessary and dangerous to discard long-established principles of data generation, analysis and interpretation in the age of big data. While many medical research areas may reasonably benefit from big data analyses, they should nevertheless be complemented by carefully designed (prospective) studies.

Keywords: Big data; Causality; Correlation; Data quality; Digitalization; Hypothesis generation; Scientific methodology; Validation.

MeSH terms

  • Big Data*
  • Biomedical Research / methods*
  • Biomedical Research / statistics & numerical data*
  • Data Interpretation, Statistical*
  • Databases, Factual / statistics & numerical data*
  • Humans
  • Prospective Studies
  • Research Design / statistics & numerical data