Drug target prediction through deep learning functional representation of gene signatures

Hao Chen; Frederick J King; Bin Zhou; Yu Wang; Carter J Canedy; Joel Hayashi; Yang Zhong; Max W Chang; Lars Pache; Julian L Wong; Yong Jia; John Joslin; Tao Jiang; Christopher Benner; Sumit K Chanda; Yingyao Zhou

doi:10.1038/s41467-024-46089-y

Drug target prediction through deep learning functional representation of gene signatures

Nat Commun. 2024 Feb 29;15(1):1853. doi: 10.1038/s41467-024-46089-y.

Authors

Hao Chen^{1

2

3}, Frederick J King⁴, Bin Zhou⁴, Yu Wang⁴, Carter J Canedy⁴, Joel Hayashi⁴, Yang Zhong⁴, Max W Chang⁵, Lars Pache⁶, Julian L Wong⁴, Yong Jia⁴, John Joslin⁴, Tao Jiang⁷, Christopher Benner⁵, Sumit K Chanda⁸, Yingyao Zhou⁹

Affiliations

¹ Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA. hchen4@andrew.cmu.edu.
² Department of Computer Science and Engineering, University of California, Riverside, 900 University Avenue, Riverside, CA, 92521, USA. hchen4@andrew.cmu.edu.
³ Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. hchen4@andrew.cmu.edu.
⁴ Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA.
⁵ Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
⁶ NCI Designated Cancer Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA.
⁷ Department of Computer Science and Engineering, University of California, Riverside, 900 University Avenue, Riverside, CA, 92521, USA.
⁸ Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, 92037, USA.
⁹ Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA. yingyao.zhou@novartis.com.

Abstract

Many machine learning applications in bioinformatics currently rely on matching gene identities when analyzing input gene signatures and fail to take advantage of preexisting knowledge about gene functions. To further enable comparative analysis of OMICS datasets, including target deconvolution and mechanism of action studies, we develop an approach that represents gene signatures projected onto their biological functions, instead of their identities, similar to how the word2vec technique works in natural language processing. We develop the Functional Representation of Gene Signatures (FRoGS) approach by training a deep learning model and demonstrate that its application to the Broad Institute's L1000 datasets results in more effective compound-target predictions than models based on gene identities alone. By integrating additional pharmacological activity data sources, FRoGS significantly increases the number of high-quality compound-target predictions relative to existing approaches, many of which are supported by in silico and/or experimental evidence. These results underscore the general utility of FRoGS in machine learning-based bioinformatics applications. Prediction networks pre-equipped with the knowledge of gene functions may help uncover new relationships among gene signatures acquired by large-scale OMICs studies on compounds, cell types, disease models, and patient cohorts.

MeSH terms

Computational Biology
Deep Learning*
Drug Development
Humans
Machine Learning

Abstract

MeSH terms

Grants and funding