SYNDSURV: A simple framework for survival analysis with data distributed across multiple institutions

Comput Biol Med. 2024 Apr:172:108288. doi: 10.1016/j.compbiomed.2024.108288. Epub 2024 Mar 15.

Abstract

Data sharing among different institutions represents one of the major challenges in developing distributed machine learning approaches, especially when data is sensitive, such as in medical applications. Federated learning is a possible solution, but requires fast communications and flawless security. Here, we propose SYNDSURV (SYNthetic Distributed SURVival), an alternative approach that simplifies the current state-of-the-art paradigm by allowing different centres to generate local simulated instances from real data and then gather them into a centralised hub, where an Artificial Intelligence (AI) model can learn in a standard way. The main advantage of this procedure is that it is model-agnostic, therefore prediction models can be directly applied in distributed applications without requiring particular adaptations as the current federated approaches do. To show the validity of our approach for medical applications, we tested it on a survival analysis task, offering a viable alternative to train AI models on distributed data. While federated learning has been mainly optimised for gradient-based approaches so far, our framework works with any predictive method, proving to be a comparable way of performing distributed learning without being too demanding towards each participating institute in terms of infrastructural requirements.

Keywords: Differential privacy; Federated learning; Survival analysis; Synthetic data.

MeSH terms

  • Artificial Intelligence*
  • Machine Learning*
  • Survival Analysis