Data-driven test strategy for COVID-19 using machine learning: A study in Lahore, Pakistan

Socioecon Plann Sci. 2022 Mar:80:101091. doi: 10.1016/j.seps.2021.101091. Epub 2021 Jun 8.

Abstract

Aims: We aimed at giving a preliminary analysis of the weakness of a current test strategy, and proposing a data-driven strategy that was self-adaptive to the dynamic change of pandemic. The effect of driven-data selection over time and space was also within the deep concern.

Methods: A mathematical definition of the test strategy were given. With the real COVID-19 test data from March to July collected in Lahore, a significance analysis of the possible features was conducted. A machine learning method based on logistic regression and priority ranking were proposed for the data-driven test strategy. With performance assessed by the area under the receiver operating characteristic curve (AUC), time series analysis and spatial cross-test were conducted.

Results: The transition of risk factors accounted for the failure of the current test strategy. The proposed data-driven strategy could enhance the positive detection rate from 2.54% to 28.18%, and the recall rate from 8.05% to 89.35% under strictly limited test capacity. Much more optimal utilization of test resources could be realized where 89.35% of total positive cases could be detected with merely 48.17% of the original test amount. The strategy showed self-adaptability with the development of pandemic, while the strategy driven by local data was proved to be optimal.

Conclusions: We recommended a generalization of such a data-driven test strategy for a better response to the global developing pandemic. Besides, the construction of the COVID-19 data system should be more refined on space for local applications.

Keywords: COVID-19; Logistic regression; Machine learning; Policy making; Spatial analysis; Test strategy; Time series analysis.