MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information

Wouter Heyndrickx; Lewis Mervin; Tobias Morawietz; Noé Sturm; Lukas Friedrich; Adam Zalewski; Anastasia Pentina; Lina Humbeck; Martijn Oldenhof; Ritsuya Niwayama; Peter Schmidtke; Nikolas Fechner; Jaak Simm; Adam Arany; Nicolas Drizard; Rama Jabal; Arina Afanasyeva; Regis Loeb; Shlok Verma; Simon Harnqvist; Matthew Holmes; Balazs Pejo; Maria Telenczuk; Nicholas Holway; Arne Dieckmann; Nicola Rieke; Friederike Zumsande; Djork-Arné Clevert; Michael Krug; Christopher Luscombe; Darren Green; Peter Ertl; Peter Antal; David Marcus; Nicolas Do Huu; Hideyoshi Fuji; Stephen Pickett; Gergely Acs; Eric Boniface; Bernd Beck; Yax Sun; Arnaud Gohier; Friedrich Rippmann; Ola Engkvist; Andreas H Göller; Yves Moreau; Mathieu N Galtier; Ansgar Schuffenhauer; Hugo Ceulemans

doi:10.1021/acs.jcim.3c00799

MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information

J Chem Inf Model. 2024 Apr 8;64(7):2331-2344. doi: 10.1021/acs.jcim.3c00799. Epub 2023 Aug 29.

Authors

Wouter Heyndrickx¹, Lewis Mervin², Tobias Morawietz³, Noé Sturm⁴, Lukas Friedrich⁵, Adam Zalewski⁶, Anastasia Pentina⁷, Lina Humbeck⁸, Martijn Oldenhof⁹, Ritsuya Niwayama¹⁰, Peter Schmidtke¹¹, Nikolas Fechner⁴, Jaak Simm⁹, Adam Arany⁹, Nicolas Drizard¹², Rama Jabal¹², Arina Afanasyeva¹³, Regis Loeb⁹, Shlok Verma¹⁴, Simon Harnqvist¹⁴, Matthew Holmes¹⁴, Balazs Pejo¹⁵, Maria Telenczuk¹⁶, Nicholas Holway⁴, Arne Dieckmann¹⁷, Nicola Rieke¹⁸, Friederike Zumsande⁶, Djork-Arné Clevert⁷, Michael Krug⁵, Christopher Luscombe¹⁴, Darren Green¹⁴, Peter Ertl⁴, Peter Antal¹⁹, David Marcus¹⁴, Nicolas Do Huu¹², Hideyoshi Fuji¹³, Stephen Pickett¹⁴, Gergely Acs¹⁵, Eric Boniface²⁰, Bernd Beck⁸, Yax Sun²¹, Arnaud Gohier¹⁰, Friedrich Rippmann⁵, Ola Engkvist²², Andreas H Göller³, Yves Moreau⁹, Mathieu N Galtier²³, Ansgar Schuffenhauer⁴, Hugo Ceulemans¹

Affiliations

¹ Janssen Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium.
² AstraZeneca R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.
³ Bayer Pharma AG, Global Drug Discovery, Chemical Research, Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany.
⁴ Novartis Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland.
⁵ Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany.
⁶ Amgen Research (Munich) GmbH, Staffelseestraße 2, Munich 81477, Germany.
⁷ Bayer AG, Machine Learning Research, Research & Development, Pharmaceuticals, Berlin 10117, Germany.
⁸ BI Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany.
⁹ KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium.
¹⁰ Institut de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France.
¹¹ Discngine, Avenue Ledru Rollin 79, Paris 75012, France.
¹² Iktos, 65 rue de Prony, Paris 75017, France.
¹³ Modality Informatics Group, Digital Research Solutions, Advanced Informatics & Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan.
¹⁴ GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
¹⁵ Budapest University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary.
¹⁶ Owkin, 12 Rue Martel, Paris 75010, France.
¹⁷ Bayer AG, API Production, Product Supply, Pharmaceuticals, Ernst-Schering-Straße 14, Bergkamen 59192, Germany.
¹⁸ NVIDIA GmbH, Floessergasse 2, Munich 81369, Germany.
¹⁹ Budapest University of Technology and Economics, Department of Measurement and Information Systems, Műegyetem rkp. 3, Budapest 1111, Hungary.
²⁰ Substra Foundation - Labelia Labs, 4 rue Voltaire, Nantes 44000, France.
²¹ Amgen Research, 1 Amgen Center Drive, Thousand Oaks, California 92130, United States.
²² AstraZeneca, Molecular AI, Discovery Sciences, R&D, Pepparedsleden 1, Mölndal 431 50, Sweden.
²³ Owkin, 4 Rue Voltaire, Nantes 44000, France.

Abstract

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.

MeSH terms

Benchmarking*
Biological Assay
Machine Learning
Quantitative Structure-Activity Relationship*