Authorship attribution based on Life-Like Network Automata

PLoS One. 2018 Mar 22;13(3):e0193703. doi: 10.1371/journal.pone.0193703. eCollection 2018.

Abstract

The authorship attribution is a problem of considerable practical and technical interest. Several methods have been designed to infer the authorship of disputed documents in multiple contexts. While traditional statistical methods based solely on word counts and related measurements have provided a simple, yet effective solution in particular cases; they are prone to manipulation. Recently, texts have been successfully modeled as networks, where words are represented by nodes linked according to textual similarity measurements. Such models are useful to identify informative topological patterns for the authorship recognition task. However, there is no consensus on which measurements should be used. Thus, we proposed a novel method to characterize text networks, by considering both topological and dynamical aspects of networks. Using concepts and methods from cellular automata theory, we devised a strategy to grasp informative spatio-temporal patterns from this model. Our experiments revealed an outperformance over structural analysis relying only on topological measurements, such as clustering coefficient, betweenness and shortest paths. The optimized results obtained here pave the way for a better characterization of textual networks.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Authorship*
  • Models, Theoretical
  • Pattern Recognition, Automated
  • Spatio-Temporal Analysis

Grants and funding

J.M. was supported by Coordination for the Improvement of Higher Education Personnel (CAPES) and the National Council for Scientific and Technological Development (CNPq) grant #405503/2017-2. E.A.C.J. was supported by Google Research Awards in Latin America grant. G.H.B.M. was supported by Coordination for the Improvement of Higher Education Personnel (CAPES) and São Paulo Research Foundation (FAPESP) grant #2015/05899-7. D.R.A. was supported by Google Research Awards in Latin America grant and São Paulo Research Foundation (FAPESP) grants #2014/20830-0, #2016/19069-9 and #2017/13464-6. O.M.B. was supported by National Council for Scientific and Technological Development (CNPq) grants #307797/2014-7 and #405503/2017-2 and São Paulo Research Foundation (FAPESP) grants #2014/08026-1 and #2015/05899-7. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.