VacSIM: Learning effective strategies for COVID-19 vaccine distribution using reinforcement learning

Raghav Awasthi; Keerat Kaur Guliani; Saif Ahmad Khan; Aniket Vashishtha; Mehrab Singh Gill; Arshita Bhatt; Aditya Nagori; Aniket Gupta; Ponnurangam Kumaraguru; Tavpritesh Sethi

doi:10.1016/j.ibmed.2022.100060

VacSIM: Learning effective strategies for COVID-19 vaccine distribution using reinforcement learning

Intell Based Med. 2022:6:100060. doi: 10.1016/j.ibmed.2022.100060. Epub 2022 May 20.

Affiliations

¹ Indraprastha Institute of Information Technology Delhi, India.
² Indian Institute of Technology Roorkee, India.
³ Maharaja Surajmal Institute of Technology, New Delhi, India.
⁴ Bhagwan Parshuram Institute of Technology, New Delhi, India.
⁵ CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.

Abstract

A COVID-19 vaccine is our best bet for mitigating the ongoing onslaught of the pandemic. However, vaccine is also expected to be a limited resource. An optimal allocation strategy, especially in countries with access inequities and temporal separation of hot-spots, might be an effective way of halting the disease spread. We approach this problem by proposing a novel pipeline VacSIM that dovetails Deep Reinforcement Learning models into a Contextual Bandits approach for optimizing the distribution of COVID-19 vaccine. Whereas the Reinforcement Learning models suggest better actions and rewards, Contextual Bandits allow online modifications that may need to be implemented on a day-to-day basis in the real world scenario. We evaluate this framework against a naive allocation approach of distributing vaccine proportional to the incidence of COVID-19 cases in five different States across India (Assam, Delhi, Jharkhand, Maharashtra and Nagaland) and demonstrate up to 9039 potential infections prevented and a significant increase in the efficacy of limiting the spread over a period of 45 days through the VacSIM approach. Our models and the platform are extensible to all states of India and potentially across the globe. We also propose novel evaluation strategies including standard compartmental model-based projections and a causality-preserving evaluation of our model. Since all models carry assumptions that may need to be tested in various contexts, we open source our model VacSIM and contribute a new reinforcement learning environment compatible with OpenAI gym to make it extensible for real-world applications across the globe.

Keywords: COVID-19; Contextual bandits problem; Policy modeling; Reinforcement learning; Vaccine distribution.

Grants and funding

T32 HG000044/HG/NHGRI NIH HHS/United States