A multi-modal approach towards mining social media data during natural disasters - a case study of Hurricane Irma

Somya D Mohanty; Brown Biggers; Saed Sayedahmed; Nastaran Pourebrahim; Evan B Goldstein; Rick Bunch; Guangqing Chi; Fereidoon Sadri; Tom P McCoy; Arthur Cosby

doi:10.1016/j.ijdrr.2020.102032

A multi-modal approach towards mining social media data during natural disasters - a case study of Hurricane Irma

Int J Disaster Risk Reduct. 2021 Feb 15:54:102032. doi: 10.1016/j.ijdrr.2020.102032. Epub 2021 Jan 11.

Authors

Affiliations

¹ Department of Computer Science, University of North Carolina - Greensboro.
² Department of Geography, Environment, and Sustainability, University of North Carolina - Greensboro.
³ Department of Agricultural Economics, Sociology, and Education, Population Research Institute, and Social Science Research Institute, The Pennsylvania State University.
⁴ Department of Family & Community Nursing, University of North Carolina - Greensboro.
⁵ Social Science Research Center, Mississippi State University.

Abstract

Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma's landfall in Florida, USA. We use 54,383 Twitter messages (out of 784K geolocated messages) from 16,598 users from Sept. 10 - 12, 2017 to develop 4 independent models to filter data for relevance: 1) a geospatial model based on forcing conditions at the place and time of each tweet, 2) an image classification model for tweets that include images, 3) a user model to predict the reliability of the tweeter, and 4) a text model to determine if the text is related to Hurricane Irma. All four models are independently tested, and can be combined to quickly filter and visualize tweets based on user-defined thresholds for each submodel. We envision that this type of filtering and visualization routine can be useful as a base model for data capture from noisy sources such as Twitter. The data can then be subsequently used by policy makers, environmental managers, emergency managers, and domain scientists interested in finding tweets with specific attributes to use during different stages of the disaster (e.g., preparedness, response, and recovery), or for detailed research.

Keywords: data mining; machine learning; natural disaster; social media.

Grants and funding

P2C HD041025/HD/NICHD NIH HHS/United States