A topic modeling framework for spatio-temporal information management

Inf Process Manag. 2020 Nov;57(6):102340. doi: 10.1016/j.ipm.2020.102340. Epub 2020 Jul 6.

Abstract

Real-time processing and learning of conflicting data, especially messages coming from different ideas, locations, and time, in a dynamic environment such as Twitter is a challenging task that recently gained lots of attention. This paper introduces a framework for managing, processing, analyzing, detecting, and tracking topics in streaming data. We propose a model selector procedure with a hybrid indicator to tackle the challenge of online topic detection. In this framework, we built an automatic data processing pipeline with two levels of cleaning. Regular and deep cleaning are applied using multiple sources of meta knowledge to enhance data quality. Deep learning and transfer learning techniques are used to classify health-related tweets, with high accuracy and improved F1-Score. In this system, we used visualization to have a better understanding of trending topics. To demonstrate the validity of this framework, we implemented and applied it to health-related twitter data from users originating in the USA over nine months. The results of this implementation show that this framework was able to detect and track the topics at a level comparable to manual annotation. To better explain the emerging and changing topics in various locations over time the result is graphically displayed on top of the United States map.

Keywords: Artificial intelligent; Spatio-temporal real time analysis; Topic modeling; Traceability; Transfer learning; Visualization.