Evaluating the informativeness of deep learning annotations for human complex diseases

Kushal K Dey; Bryce van de Geijn; Samuel Sungil Kim; Farhad Hormozdiari; David R Kelley; Alkes L Price

doi:10.1038/s41467-020-18515-4

Evaluating the informativeness of deep learning annotations for human complex diseases

Nat Commun. 2020 Sep 17;11(1):4703. doi: 10.1038/s41467-020-18515-4.

Authors

Kushal K Dey¹, Bryce van de Geijn², Samuel Sungil Kim^{2

3}, Farhad Hormozdiari², David R Kelley⁴, Alkes L Price^{5

6}

Affiliations

¹ Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA. kdey@hsph.harvard.edu.
² Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
³ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
⁴ Calico Labs, South San Francisco, CA, USA.
⁵ Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA. aprice@hsph.harvard.edu.
⁶ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. aprice@hsph.harvard.edu.

Abstract

Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average N = 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Alleles
Deep Learning*
Disease / genetics*
Genetic Predisposition to Disease
Genome, Human
Genome-Wide Association Study
Histones / genetics
Humans
Linkage Disequilibrium
Models, Genetic
Molecular Sequence Annotation*
Phenotype
Polymorphism, Single Nucleotide

Substances

Histones
histone H3 trimethyl Lys4

Abstract

Publication types

MeSH terms

Substances

Grants and funding