Extraction of temporal networks from term co-occurrences in online textual sources

Marko Popović; Hrvoje Štefančić; Borut Sluban; Petra Kralj Novak; Miha Grčar; Igor Mozetič; Michelangelo Puliga; Vinko Zlatić

doi:10.1371/journal.pone.0099515

Extraction of temporal networks from term co-occurrences in online textual sources

PLoS One. 2014 Dec 3;9(12):e99515. doi: 10.1371/journal.pone.0099515. eCollection 2014.

Authors

Marko Popović¹, Hrvoje Štefančić², Borut Sluban³, Petra Kralj Novak³, Miha Grčar³, Igor Mozetič³, Michelangelo Puliga⁴, Vinko Zlatić¹

Affiliations

¹ Theoretical Physics Division, Rudjer Bošković Institute, P.O.Box 180, HR-10002, Zagreb, Croatia.
² Theoretical Physics Division, Rudjer Bošković Institute, P.O.Box 180, HR-10002, Zagreb, Croatia; Catholic University of Croatia, Zagreb, Croatia.
³ Jožef Stefan Institute, Ljubljana, Slovenia.
⁴ IMT Alti Studi Lucca, Lucca, Italia.

Abstract

A stream of unstructured news can be a valuable source of hidden relations between different entities, such as financial institutions, countries, or persons. We present an approach to continuously collect online news, recognize relevant entities in them, and extract time-varying networks. The nodes of the network are the entities, and the links are their co-occurrences. We present a method to estimate the significance of co-occurrences, and a benchmark model against which their robustness is evaluated. The approach is applied to a large set of financial news, collected over a period of two years. The entities we consider are 50 countries which issue sovereign bonds, and which are insured by Credit Default Swaps (CDS) in turn. We compare the country co-occurrence networks to the CDS networks constructed from the correlations between the CDS. The results show relatively small, but significant overlap between the networks extracted from the news and those from the CDS correlations.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computer Communication Networks*
Humans
Models, Theoretical
Online Systems

Grants and funding

This work was supported in part by the European Commission under the FP7 projects FOC (Forecasting financial crises, grant no. 255987) and MULTIPLEX (Foundational Research on MULTIlevel comPLEX networks and systems, grant no. 317532), and by the Slovenian Research Agency programme Knowledge Technologies (grant no. P2-103). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.