Google as a cancer control tool in Queensland

BMC Cancer. 2017 Dec 4;17(1):816. doi: 10.1186/s12885-017-3828-x.

Abstract

Background: Recent advances in methodologies utilizing "big data" have allowed researchers to investigate the use of common internet search engines as a real time tool to track disease. Little is known about its utility with tracking cancer incidence. This study aims to investigate the potential correlates of monthly internet search volume indexes (SVIs) and observed monthly age standardised incidence rates (ASRs) for breast cancer, colorectal cancer, melanoma and prostate cancer.

Methods: The monthly ASRs for the four cancers in Queensland were calculated using data from the Queensland Cancer Registry between January 2006 and December 2012. The monthly SVIs of the respective cancer search terms in Queensland were accessed from Google Trends for the same period. A time series seasonal decomposition method was performed to detect the seasonal patterns of SVIs and ASRs. Pearson's correlation coefficient and time series cross-correlation analysis were used to assess the associations between SVIs and ASRs. Linear regression models were used to examine the power of SVIs to predict monthly in ASRs.

Results: Increases in the monthly ASRs of the four cancers were significantly correlated with increases in the monthly SVIs of the respective cancers except for colorectal cancer. The predictive power of the SVIs to explain variances in the corresponding ASRs varied by cancer type, with the percent explained ranging from 5.6% for breast cancer to 17.9% for skin cancer (SVI) with melanoma (ASR). Some improvement in the variation explained was obtained by including more search terms or lagged SVIs for the respective cancers in the linear regression models. The seasonal analysis indicated that the SVIs peaked periodically at around their respective cancer awareness months.

Conclusions: Using SVIs from a popular internet search engine was only able to explain a small portion of changes in the respective ASRs. While an expanded regression model explained a higher proportion of variability, the interpretation of this was difficult. Further development and refinement of this approach will be needed before search-based cancer surveillance can provide useful information regarding resource deployment to guide cancer control and track the impact of cancer awareness and education programmes.

Keywords: Age standardised rates; Cancer incidence; Cross-correlation; Google Trends; Search volume indexes.

MeSH terms

  • Breast Neoplasms / epidemiology
  • Colorectal Neoplasms / epidemiology
  • Epidemiological Monitoring*
  • Female
  • Humans
  • Incidence
  • Internet
  • Male
  • Melanoma / epidemiology
  • Prostatic Neoplasms / epidemiology
  • Queensland / epidemiology
  • Registries
  • Search Engine
  • Seasons
  • Skin Neoplasms / epidemiology