Data driven identification of international cutting edge science and technologies using SpaCy

PLoS One. 2022 Oct 12;17(10):e0275872. doi: 10.1371/journal.pone.0275872. eCollection 2022.

Abstract

Difficulties in collecting, processing, and identifying massive data have slowed research on cutting-edge science and technology hotspots. Promoting these technologies will not be successful without an effective data-driven method to identify cutting-edge technologies. This paper proposes a data-driven model for identifying global cutting-edge science technologies based on SpaCy. In this model, we collected data released by 17 well-known American technology media websites from July 2019 to July 2020 using web crawling with Python. We combine graph-based neural network learning with active learning as the research method in this paper. Next, we introduced a ten-fold cross-check to train the model through machine learning with repeated experiments. The experimental results show that this model performed very well in entity recognition tasks with an F value of 98.11%. The model provides an information source for cutting-edge technology identification. It can promote innovations in cutting-edge technologies through its effective identification and tracking and explore more efficient scientific and technological research work modes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning
  • Neural Networks, Computer*
  • Technology*

Grants and funding

This research was supported by the National Social Science Fund of China, (grant no. 20BTQ056). HP, Gong (Huaping Gong) received the award, her website URL is: http://spm.ncu.edu.cn/szdw/yjsds/af9e1c71e05f4f8a9f2ab591919844db.htm. The funders had no role in data collection and analysis, preparation of the manuscript. This research was also supported by the National Natural Science Foundation of China, (grant no.72163021). YQ, He (Yiqing He) received the award, his website URL is: http://spm.ncu.edu.cn/szdw/yjsds/af9e1c71e05f4f8a9f2ab591919844db.htm. The funders had no role in data collection and analysis, preparation of the manuscript.