Using Shopping Data to Improve the Diagnosis of Ovarian Cancer: Computational Analysis of a Web-Based Survey

JMIR Cancer. 2023 Mar 31:9:e37141. doi: 10.2196/37141.

Abstract

Background: Shopping data can be analyzed using machine learning techniques to study population health. It is unknown if the use of such methods can successfully investigate prediagnosis purchases linked to self-medication of symptoms of ovarian cancer.

Objective: The aims of this study were to gain new domain knowledge from women's experiences, understand how women's shopping behavior relates to their pathway to the diagnosis of ovarian cancer, and inform research on computational analysis of shopping data for population health.

Methods: A web-based survey on individuals' shopping patterns prior to an ovarian cancer diagnosis was analyzed to identify key knowledge about health care purchases. Logistic regression and random forest models were employed to statistically examine how products linked to potential symptoms related to presentation to health care and timing of diagnosis.

Results: Of the 101 women surveyed with ovarian cancer, 58.4% (59/101) bought nonprescription health care products for up to more than a year prior to diagnosis, including pain relief and abdominal products. General practitioner advice was the primary reason for the purchases (23/59, 39%), with 51% (30/59) occurring due to a participant's doctor believing their health problems were due to a condition other than ovarian cancer. Associations were shown between purchases made because a participant's doctor believing their health problems were due to a condition other than ovarian cancer and the following variables: health problems for longer than a year prior to diagnosis (odds ratio [OR] 7.33, 95% CI 1.58-33.97), buying health care products for more than 6 months to a year (OR 3.82, 95% CI 1.04-13.98) or for more than a year (OR 7.64, 95% CI 1.38-42.33), and the number of health care product types purchased (OR 1.54, 95% CI 1.13-2.11). Purchasing patterns are shown to be potentially predictive of a participant's doctor thinking their health problems were due to some condition other than ovarian cancer, with nested cross-validation of random forest classification models achieving an overall in-sample accuracy score of 89.1% and an out-of-sample score of 70.1%.

Conclusions: Women in the survey were 7 times more likely to have had a duration of more than a year of health problems prior to a diagnosis of ovarian cancer if they were self-medicating based on advice from a doctor rather than having made the decision to self-medicate independently. Predictive modelling indicates that women in such situations, who are self-medicating because their doctor believes their health problems may be due to a condition other than ovarian cancer, exhibit distinct shopping behaviors that may be identifiable within purchasing data. Through exploratory research combining women sharing their behaviors prior to diagnosis and computational analysis of these data, this study demonstrates that women's shopping data could potentially be useful for early ovarian cancer detection.

Keywords: cancer; carcinoma; diagnostic errors; machine learning; nonprescription drugs; oncology; ovarian cancer; ovarian epithelial; ovarian neoplasms; ovary; over-the-counter; pharmaceutical; self-medication; symptom; symptom assessment.