Large-scale discovery of novel neurodevelopmental disorder-related genes through a unified analysis of single-nucleotide and copy number variants

Kohei Hamanaka; Noriko Miyake; Takeshi Mizuguchi; Satoko Miyatake; Yuri Uchiyama; Naomi Tsuchida; Futoshi Sekiguchi; Satomi Mitsuhashi; Yoshinori Tsurusaki; Mitsuko Nakashima; Hirotomo Saitsu; Kohei Yamada; Masamune Sakamoto; Hiromi Fukuda; Sachiko Ohori; Ken Saida; Toshiyuki Itai; Yoshiteru Azuma; Eriko Koshimizu; Atsushi Fujita; Biray Erturk; Yoko Hiraki; Gaik-Siew Ch'ng; Mitsuhiro Kato; Nobuhiko Okamoto; Atsushi Takata; Naomichi Matsumoto

doi:10.1186/s13073-022-01042-w

Large-scale discovery of novel neurodevelopmental disorder-related genes through a unified analysis of single-nucleotide and copy number variants

Genome Med. 2022 Apr 26;14(1):40. doi: 10.1186/s13073-022-01042-w.

Authors

Kohei Hamanaka^#¹, Noriko Miyake^#², Takeshi Mizuguchi^#², Satoko Miyatake^{2

3}, Yuri Uchiyama^{2

4}, Naomi Tsuchida^{2

4}, Futoshi Sekiguchi², Satomi Mitsuhashi², Yoshinori Tsurusaki⁵, Mitsuko Nakashima⁶, Hirotomo Saitsu⁶, Kohei Yamada², Masamune Sakamoto², Hiromi Fukuda², Sachiko Ohori², Ken Saida², Toshiyuki Itai², Yoshiteru Azuma^{2

7}, Eriko Koshimizu², Atsushi Fujita², Biray Erturk^{8

9}, Yoko Hiraki¹⁰, Gaik-Siew Ch'ng¹¹, Mitsuhiro Kato¹², Nobuhiko Okamoto¹³, Atsushi Takata^{14

15}, Naomichi Matsumoto¹⁶

Affiliations

¹ Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan. hamanaka@yokohama-cu.ac.jp.
² Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan.
³ Clinical Genetics Department, Yokohama City University Hospital, Yokohama, Japan.
⁴ Department of Rare Disease Genomics, Yokohama City University Hospital, Yokohama, Japan.
⁵ Faculty of Nutritional Science, Sagami Women's University, Sagamihara, Japan.
⁶ Department of Biochemistry, Hamamatsu University School of Medicine, Hamamatsu, Japan.
⁷ Department of Pediatrics, Aichi Medical University, Nagakute, Japan.
⁸ Department of Medical Genetics, Ege University Faculty of Medicine, Izmir, Turkey.
⁹ Current affiliation: Department of Medical Genetics, Prof. Dr. Cemil Tascioglu City Hospital, Istanbul, Turkey.
¹⁰ Hiroshima Municipal Center for Child Health and Development, Hiroshima, Japan.
¹¹ Department of Genetics, Penang Hospital, Penang, Malaysia.
¹² Department of Pediatrics, Showa University School of Medicine, Tokyo, Japan.
¹³ Department of Medical Genetics, Osaka Women's and Children's Hospital, Izumi, Japan.
¹⁴ Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan. atakata@yokohama-cu.ac.jp.
¹⁵ Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, Wako, Japan. atakata@yokohama-cu.ac.jp.
¹⁶ Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Japan. naomat@yokohama-cu.ac.jp.

^# Contributed equally.

Abstract

Background: Previous large-scale studies of de novo variants identified a number of genes associated with neurodevelopmental disorders (NDDs); however, it was also predicted that many NDD-associated genes await discovery. Such genes can be discovered by integrating copy number variants (CNVs), which have not been fully considered in previous studies, and increasing the sample size.

Methods: We first constructed a model estimating the rates of de novo CNVs per gene from several factors such as gene length and number of exons. Second, we compiled a comprehensive list of de novo single-nucleotide variants (SNVs) in 41,165 individuals and de novo CNVs in 3675 individuals with NDDs by aggregating our own and publicly available datasets, including denovo-db and the Deciphering Developmental Disorders study data. Third, summing up the de novo CNV rates that we estimated and SNV rates previously established, gene-based enrichment of de novo deleterious SNVs and CNVs were assessed in the 41,165 cases. Significantly enriched genes were further prioritized according to their similarity to known NDD genes using a deep learning model that considers functional characteristics (e.g., gene ontology and expression patterns).

Results: We identified a total of 380 genes achieving statistical significance (5% false discovery rate), including 31 genes affected by de novo CNVs. Of the 380 genes, 52 have not previously been reported as NDD genes, and the data of de novo CNVs contributed to the significance of three genes (GLTSCR1, MARK2, and UBR3). Among the 52 genes, we reasonably excluded 18 genes [a number almost identical to the theoretically expected false positives (i.e., 380 × 0.05 = 19)] given their constraints against deleterious variants and extracted 34 "plausible" candidate genes. Their validity as NDD genes was consistently supported by their similarity in function and gene expression patterns to known NDD genes. Quantifying the overall similarity using deep learning, we identified 11 high-confidence (> 90% true-positive probabilities) candidate genes: HDAC2, SUPT16H, HECTD4, CHD5, XPO1, GSK3B, NLGN2, ADGRB1, CTR9, BRD3, and MARK2.

Conclusions: We identified dozens of new candidates for NDD genes. Both the methods and the resources developed here will contribute to the further identification of novel NDD-associated genes.

Keywords: Autism spectrum disorder; Copy number variant; Copy number variation; De novo variant; Deep learning; Epileptic encephalopathy; Intellectual disability; Mutation rate; Neurodevelopmental disorder; Rare disease.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cell Cycle Proteins / genetics
DNA Copy Number Variations*
DNA Helicases / genetics
Exons
Humans
Nerve Tissue Proteins / genetics
Neurodevelopmental Disorders* / genetics
Nucleotides
Transcription Factors / genetics

Substances

Cell Cycle Proteins
Nerve Tissue Proteins
Nucleotides
SUPT16H protein, human
Transcription Factors
DNA Helicases
CHD5 protein, human