Confounding Effect of Undergraduate Semester-Driven "Academic" Internet Searches on the Ability to Detect True Disease Seasonality in Google Trends Data: Fourier Filter Method Development and Demonstration

JMIR Infodemiology. 2022 Jul 19;2(2):e34464. doi: 10.2196/34464. eCollection 2022 Jul-Dec.

Abstract

Background: Internet search volume for medical information, as tracked by Google Trends, has been used to demonstrate unexpected seasonality in the symptom burden of a variety of medical conditions. However, when more technical medical language is used (eg, diagnoses), we believe that this technique is confounded by the cyclic, school year-driven internet search patterns of health care students.

Objective: This study aimed to (1) demonstrate that artificial "academic cycling" of Google Trends' search volume is present in many health care terms, (2) demonstrate how signal processing techniques can be used to filter academic cycling out of Google Trends data, and (3) apply this filtering technique to some clinically relevant examples.

Methods: We obtained the Google Trends search volume data for a variety of academic terms demonstrating strong academic cycling and used a Fourier analysis technique to (1) identify the frequency domain fingerprint of this modulating pattern in one particularly strong example, and (2) filter that pattern out of the original data. After this illustrative example, we then applied the same filtering technique to internet searches for information on 3 medical conditions believed to have true seasonal modulation (myocardial infarction, hypertension, and depression), and all bacterial genus terms within a common medical microbiology textbook.

Results: Academic cycling explains much of the seasonal variation in internet search volume for many technically oriented search terms, including the bacterial genus term ["Staphylococcus"], for which academic cycling explained 73.8% of the variability in search volume (using the squared Spearman rank correlation coefficient, P<.001). Of the 56 bacterial genus terms examined, 6 displayed sufficiently strong seasonality to warrant further examination post filtering. This included (1) ["Aeromonas" + "Plesiomonas"] (nosocomial infections that were searched for more frequently during the summer), (2) ["Ehrlichia"] (a tick-borne pathogen that was searched for more frequently during late spring), (3) ["Moraxella"] and ["Haemophilus"] (respiratory infections that were searched for more frequently during late winter), (4) ["Legionella"] (searched for more frequently during midsummer), and (5) ["Vibrio"] (which spiked for 2 months during midsummer). The terms ["myocardial infarction"] and ["hypertension"] lacked any obvious seasonal cycling after filtering, whereas ["depression"] maintained an annual cycling pattern.

Conclusions: Although it is reasonable to search for seasonal modulation of medical conditions using Google Trends' internet search volume and lay-appropriate search terms, the variation in more technical search terms may be driven by health care students whose search frequency varies with the academic school year. When this is the case, using Fourier analysis to filter out academic cycling is a potential means to establish whether additional seasonality is present.

Keywords: FFT; Fast Fourier transform; Google; Google Trends; Google search; depression; health information; health information seeking; internet search; pathogenic bacteria; seasonality.