Predicting drug approvals: The Novartis data science and artificial intelligence challenge

Kien Wei Siah; Nicholas W Kelley; Steffen Ballerstedt; Björn Holzhauer; Tianmeng Lyu; David Mettler; Sophie Sun; Simon Wandel; Yang Zhong; Bin Zhou; Shifeng Pan; Yingyao Zhou; Andrew W Lo

doi:10.1016/j.patter.2021.100312

Predicting drug approvals: The Novartis data science and artificial intelligence challenge

Patterns (N Y). 2021 Jul 21;2(8):100312. doi: 10.1016/j.patter.2021.100312. eCollection 2021 Aug 13.

Authors

Kien Wei Siah^{1

2}, Nicholas W Kelley³, Steffen Ballerstedt³, Björn Holzhauer³, Tianmeng Lyu⁴, David Mettler³, Sophie Sun⁴, Simon Wandel³, Yang Zhong⁵, Bin Zhou⁵, Shifeng Pan⁵, Yingyao Zhou⁵, Andrew W Lo^{1

2

6}

Affiliations

¹ Laboratory for Financial Engineering, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.
² Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.
³ Novartis Pharma AG, 4056 Basel, Switzerland.
⁴ Novartis Pharmaceuticals Corporation, East Hanover, NJ 07936, USA.
⁵ Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121, USA.
⁶ Sante Fe Institute, Santa Fe, NM 87501, USA.

Abstract

We describe a novel collaboration between academia and industry, an in-house data science and artificial intelligence challenge held by Novartis to develop machine-learning models for predicting drug-development outcomes, building upon research at MIT using data from Informa as the starting point. With over 50 cross-functional teams from 25 Novartis offices around the world participating in the challenge, the domain expertise of these Novartis researchers was leveraged to create predictive models with greater sophistication. Ultimately, two winning teams developed models that outperformed the baseline MIT model-areas under the curve of 0.88 and 0.84 versus 0.78, respectively-through state-of-the-art machine-learning algorithms and the use of newly incorporated features and data. In addition to validating the variables shown to be associated with drug approval in the earlier MIT study, the challenge also provided new insights into the drivers of drug-development success and failure.

Keywords: XGBoost; artificial intelligence; clinical trials; data science; prediction; probability of success.