Patent relatedness and velocity in the Chinese pharmaceutical industry: A dataset of Jaccard similarity indices

Data Brief. 2021 Feb 4:35:106814. doi: 10.1016/j.dib.2021.106814. eCollection 2021 Apr.

Abstract

The dataset is about innovation dynamics in the pharmaceutical industry in China. Innovation dynamics is interpreted as knowledge transfer across technologies and through time (velocity). The dataset provides access to 143,916 Jaccard similarity indices. A Jaccard similarity indice is a distance measure between two units. Here, they proxy relatedness across technologies (classes) and through time (velocity). The Jaccard similarity indices are computed based on a Natural Language Processing treatment of 69,923 patents in the pharmaceutical industry in China from 1990 to 2017.

Keywords: China; Derwent world patents index; Innovation; Jaccard similarity indice; Natural language processing; Patents; Pharmaceutical industry.