Explainable prediction of node labels in multilayer networks: a case study of turnover prediction in organizations

Sci Rep. 2024 Apr 19;14(1):9036. doi: 10.1038/s41598-024-59690-4.

Abstract

In real-world classification problems, it is important to build accurate prediction models and provide information that can improve decision-making. Decision-support tools are often based on network models, and this article uses information encoded by social networks to solve the problem of employer turnover. However, understanding the factors behind black-box prediction models can be challenging. Our question was about the predictability of employee turnover, given information from the multilayer network that describes collaborations and perceptions that assess the performance of organizations that indicate the success of cooperation. Our goal was to develop an accurate prediction procedure, preserve the interpretability of the classification, and capture the wide variety of specific reasons that explain positive cases. After a feature engineering, we identified variables with the best predictive power using decision trees and ranked them based on their added value considering their frequent co-occurrence. We applied the Random Forest using the SMOTE balancing technique for prediction. We calculated the SHAP values to identify the variables that contribute the most to individual predictions. As a last step, we clustered the sample based on SHAP values to fine-tune the explanations for quitting due to different background factors.