A comparative study for determining Covid-19 risk levels by unsupervised machine learning methods

Expert Syst Appl. 2022 Mar 15:190:116243. doi: 10.1016/j.eswa.2021.116243. Epub 2021 Nov 19.

Abstract

The restrictions have been preferred by governments to reduce the spread of Covid-19 and to protect people's health according to regional risk levels. The risk levels of locations are determined due to threshold values ​​based on the number of cases per 100,000 people without environmental variables. The purpose of our study is to apply unsupervised machine learning techniques to determine the cities with similar risk levels by using the number of cases and environmental parameters. Hierarchical, partitional, soft, and gray relational clustering algorithms were applied to different datasets created with weekly the number of cases, population densities, average ages, and air pollution levels. Comparisons of the clustering algorithms were performed by using internal validation indexes, and the most successful method was identified. In the study, it was revealed that the most successful method in clustering based on the number of cases is Gray Relational Clustering. The results show that using the environmental variables for restrictions requires more clusters than 4 for healthier decisions and Gray Relational Clustering gives stable results, unlike other algorithms.

Keywords: Clustering; Covid-19; Gray relational clustering; Restrictions; Risk levels; Unsupervised machine learning.