Understanding Why Many People Experiencing Homelessness Reported Migrating to a Small Canadian City: Machine Learning Approach With Augmented Data

JMIR Form Res. 2023 May 2:7:e43511. doi: 10.2196/43511.

Abstract

Background: Over the past years, homelessness has become a substantial issue around the globe. The largest social services organization in Thunder Bay, Ontario, Canada, has observed that a majority of the people experiencing homelessness in the city were from outside of the city or province. Thus, to improve programming and resource allocation for people experiencing homelessness in the city, including shelter use, it was important to investigate the trends associated with homelessness and migration.

Objective: This study aimed to address 3 research questions related to homelessness and migration in Thunder Bay: What factors predict whether a person who migrated to the city and is experiencing homelessness stays or leaves shelters? If an individual stays, how long are they likely to stay? What factors predict stay duration?

Methods: We collected the required data from 2 sources: a survey conducted with people experiencing homelessness at 3 homeless shelters in Thunder Bay and the database of a homeless information management system. The records of 110 migrants were used for the analysis. Two feature selection techniques were used to address the first and third research questions, and 8 machine learning models were used to address the second research question. In addition, data augmentation was performed to improve the size of the data set and to resolve the class imbalance problem. The area under the receiver operating characteristic curve value and cross-validation accuracy were used to measure the models' performances while avoiding possible model overfitting.

Results: Factors predicting an individual's stay duration included home or previous district, highest educational qualification, recent receipt of mental health support, migrating to visit family or friends, and finding employment upon arrival. For research question 2, among the classification models developed for predicting the stay duration of migrants, the random forest and gradient boosting tree models presented better results with area under the receiver operating characteristic curve values of 0.91 and 0.93, respectively. Finally, home district, band membership, status card, previous district, and recent support for drug and/or alcohol use were recognized as the factors predicting stay duration.

Conclusions: Applying machine learning enables researchers to make predictions related to migrants' homelessness and investigate how various factors become determinants of the predictions. We hope that the findings of this study will aid future policy making and resource allocation to better serve people experiencing homelessness. However, further improvements in the data set size and interpretation of the identified factors in decision-making are required.

Keywords: data augmentation; feature selection; homelessness; machine learning; migrants.