Link prediction and feature relevance in knowledge networks: A machine learning approach

PLoS One. 2023 Nov 30;18(11):e0290018. doi: 10.1371/journal.pone.0290018. eCollection 2023.

Abstract

We propose a supervised machine learning approach to predict partnership formation between universities. We focus on successful joint R&D projects funded by the Horizon 2020 programme in three research domains: Social Sciences and Humanities, Physical and Engineering Sciences, and Life Sciences. We perform two related analyses: link formation prediction, and feature importance detection. In predicting link formation, we consider two settings: one including all features, both exogenous (pertaining to the node) and endogenous (pertaining to the network); and one including only exogenous features (thus removing the network attributes of the nodes). Using out-of-sample cross-validated accuracy, we obtain 91% prediction accuracy when both types of attributes are used, and around 67% when using only the exogenous ones. This proves that partnership predictive power is on average 24% larger for universities already incumbent in the programme than for newcomers (for which network attributes are clearly unknown). As for feature importance, by computing super-learner average partial effects and elasticities, we find that the endogenous attributes are the most relevant in affecting the probability to generate a link, and observe a largely negative elasticity of the link probability to feature changes, fairly uniform across attributes and domains.

MeSH terms

  • Humanities
  • Knowledge
  • Machine Learning*
  • Supervised Machine Learning*

Grants and funding

This work was supported by the European Union Horizon 2020 Programme, Grant n. 824091 RISIS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.