The Language of Innovation

Andrea Tacchella; Andrea Napoletano; Luciano Pietronero

doi:10.1371/journal.pone.0230107

The Language of Innovation

PLoS One. 2020 Apr 30;15(4):e0230107. doi: 10.1371/journal.pone.0230107. eCollection 2020.

Authors

Andrea Tacchella^{1

2}, Andrea Napoletano^{2

3}, Luciano Pietronero^{3

4}

Affiliations

¹ European Commission, Joint Research Centre (JRC), Seville, Spain.
² Institute for Complex Systems, CNR, Rome, Italy.
³ Sapienza, University of Rome, Rome, Italy.
⁴ Museo Storico della Fisica e Centro Studi e Ricerche Enrico Fermi, Compendio del Viminale, Rome, Italy.

Abstract

Predicting innovation is a peculiar problem in data science. Following its definition, an innovation is always a never-seen-before event, leaving no room for traditional supervised learning approaches. Here we propose a strategy to address the problem in the context of innovative patents, by defining innovations as never-seen-before associations of technologies and exploiting self-supervised learning techniques. We think of technological codes present in patents as a vocabulary and the whole technological corpus as written in a specific, evolving language. We leverage such structure with techniques borrowed from Natural Language Processing by embedding technologies in a high dimensional euclidean space where relative positions are representative of learned semantics. Proximity in this space is an effective predictor of specific innovation events, that outperforms a wide range of standard link-prediction metrics. The success of patented innovations follows a complex dynamics characterized by different patterns which we analyze in details with specific examples. The methods proposed in this paper provide a completely new way of understanding and forecasting innovation, by tackling it from a revealing perspective and opening interesting scenarios for a number of applications and further analytic approaches.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Forecasting*
Humans
Language*
Natural Language Processing*

Grants and funding

Funder: MIUR https://www.miur.gov.it/ Grant Reference: CRISISLAB Recipient: Prof. Luciano Pietronero. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.