ARTCDP: An automated data platform for monitoring emerging patterns concerning road traffic crashes in China

Accid Anal Prev. 2022 Sep:174:106727. doi: 10.1016/j.aap.2022.106727. Epub 2022 Jun 3.

Abstract

Online media reports provide valuable information for road traffic injury prevention, but technical challenges concerning data acquisition and processing limit analysis and interpretation of such data. Integrating injury epidemiology theory and big data technology, we developed a data platform consisting of four layers (data acquisition, data processing, application and data storage) to automatically collect reports from online Chinese media concerning road traffic crashes every 24 h. We built a text classification model using 20,000 manually annotated news stories based on the Bidirectional Encoder Representations from Transformers (BERT) and then used natural language processing algorithms to extract data concerning 27 structured variables from the news sources. The accuracy of the BERT-based text classification model was 0.9271, with information extraction accuracy exceeding 80% for 22 variables. As of November 30, 2021, the data platform collected 244,650 eligible media reports covering all 333 prefecture-level divisions in China. These reports were from 37,073 websites or social media accounts, which were geographically located in all 31 provinces and over 98% of prefecture-level divisions. Data availability varied greatly from 0.9% to 100% across the 27 structured variables. Additionally, the platform identified 645,787 potentially relevant keywords when applying natural language processing techniques to the textual media reports. Platform data were highly correlated with road police data in province-based road traffic crash statistics (crashes, rs = 0.799; non-fatal injuries, rs = 0.802; deaths, rs = 0.775). In particular, the platform offers valuable data (like crashes involving electric vehicles) that are not included in official road traffic crash statistics. The new automated data platform shows great potential for timely detection of emerging characteristics of road traffic crashes. Further research is needed to improve the platform and apply it to real-time monitoring and analysis of road traffic injuries.

Keywords: BERT model; China; Natural language processing; Online media data; Road traffic crashes.

MeSH terms

  • Accidents, Traffic* / prevention & control
  • China / epidemiology
  • Humans
  • Police
  • Wounds and Injuries* / epidemiology
  • Wounds and Injuries* / prevention & control