[Exploring the association between de novo mutations and non-syndromic cleft lip with or without palate based on whole exome sequencing of case-parent trios]

Beijing Da Xue Xue Bao Yi Xue Ban. 2022 Jun 18;54(3):387-393. doi: 10.19723/j.issn.1671-167X.2022.03.001.
[Article in Chinese]

Abstract

Objective: To explore the association between de novo mutations (DNM) and non-syndromic cleft lip with or without palate (NSCL/P) using case-parent trio design.

Methods: Whole-exome sequencing was conducted for twenty-two NSCL/P trios and Genome Analysis ToolKit (GATK) was used to identify DNM by comparing the alleles of the cases and their parents. Information of predictable functions was annotated to the locus with SnpEff. Enrichment analysis for DNM was conducted to test the difference between the actual number and the expected number of DNM, and to explore whether there were genes with more DNM than expected. NSCL/P-related genes indicated by previous studies with solid evidence were selected by literature reviewing. Protein-protein interactions analysis was conducted among the genes with protein-altering DNM and NSCL/P-related genes. R package "denovolyzeR" was used for the enrichment analysis (Bonferroni correction: P=0.05/n, n is the number of genes in the whole genome range). Protein-protein interactions among genes with DNM and genes with solid evidence on the risk factors of NSCL/P were predicted depending on the information provided by STRING database.

Results: A total of 339 908 SNPs were qualified for the subsequent analysis after quality control. The number of high confident DNM identified by GATK was 345. Among those DNM, forty-four DNM were missense mutations, one DNM was nonsense mutation, two DNM were splicing site mutations, twenty DNM were synonymous mutations and others were located in intron or intergenic regions. The results of enrichment analysis showed that the number of protein-altering DNM on the exome regions was larger than expected (P < 0.05), and five genes (KRTCAP2, HMCN2, ANKRD36C, ADGRL2 and DIPK2A) had more DNM than expected (P < 0.05/(2×19 618)). Protein-protein interaction analysis was conducted among forty-six genes with protein-altering DNM and thirteen genes associated with NSCL/P selected by literature reviewing. Six pairs of interactions occurred between the genes with DNM and known NSCL/P-related genes. The score measuring the confidence level of the predicted interaction between RGPD4 and SUMO1 was 0.868, which was higher than the scores for other pairs of genes.

Conclusion: Our study provided novel insights into the development of NSCL/P and demonstrated that functional analyses of genes carrying DNM were warranted to understand the genetic architecture of complex diseases.

目的: 在中国人非综合征型唇裂伴或不伴腭裂(non-syndromic cleft lip with or without palate, NSCL/P)核心家系中,利用全外显子组测序探索与NSCL/P发病相关的新生突变位点。

方法: 对22个中国NSCL/P核心家系进行全外显子组测序,采用基因组分析工具包(Genome Analysis ToolKit, GATK)通过对比亲代与子代同一位点的等位基因识别新生突变位点,采用SnpEff软件对位点进行功能注释。对新生突变位点进行富集分析,检验全外显子区域内存在的新生突变数量是否高于预期值,以及是否存在包含新生突变数量显著高于预期值的基因。通过查阅文献总结既往研究提示与NSCL/P发病存在较强证据支持的基因,根据注释信息筛选能够引起蛋白质改变的新生突变位点,对该类位点所在基因编码的蛋白质与NSCL/P相关基因编码的蛋白质进行交互作用分析。利用R软件的denovolyzeR包进行富集分析(Bonferroni多重检验校正:P=0.05/nn为基因个数)。利用STRING数据库预测新生突变所在基因与已知NSCL/P致病基因编码的蛋白质间的交互作用。

结果: 全外显子组测序得到的位点中共有339 908个位点通过质量控制,经GATK软件比对共筛选出345个高置信度新生突变,其中错义突变44个,无义突变1个,经典剪接位点2个,同义突变20个,内含子区或基因间区位点278个。富集分析显示,全外显子组中引起蛋白质改变的新生突变数量显著高于预期值(P < 0.05),KRTCAP2HMCN2ANKRD36CADGRL2DIPK2A 5个基因所含的新生突变位点高于预期(P < 0.05/(2×19 618))。蛋白质交互作用分析纳入46个包含能够引起蛋白质序列改变的新生突变所在的基因及13个既往研究提示与NSCL/P存在关联的基因,两类基因编码的蛋白质之间存在6组交互作用,其中RGPD4SUMO1编码的蛋白质的交互作用证据可信度最高,STRING数据库交互作用评分为0.868。

结论: 研究为NSCL/P的发病提供了新的证据,对携带新生突变的基因进行功能分析有助于揭示复杂疾病的遗传结构。

Keywords: De novo mutations; Enrichment analysis; Non-syndromic cleft lip with or without palate; Protein-protein interactions.

MeSH terms

  • Asian People
  • Case-Control Studies
  • Cleft Lip* / genetics
  • Cleft Palate* / genetics
  • Exome Sequencing
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study
  • Genotype
  • Humans
  • Mutation
  • Parents
  • Polymorphism, Single Nucleotide

Grants and funding

国家自然科学基金(81573225、81102178)和北京市自然科学基金(7172115)