Constructing a database for the relations between CNV and human genetic diseases via systematic text mining

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):528. doi: 10.1186/s12859-018-2526-2.

Abstract

Background: The detection and interpretation of CNVs are of clinical importance in genetic testing. Several databases and web services are already being used by clinical geneticists to interpret the medical relevance of identified CNVs in patients. However, geneticists or physicians would like to obtain the original literature context for more detailed information, especially for rare CNVs that were not included in databases.

Results: The resulting CNVdigest database includes 440,485 sentences for CNV-disease relationship. A total number of 1582 CNVs and 2425 diseases are involved. Sentences describing CNV-disease correlations are indexed in CNVdigest, with CNV mentions and disease mentions annotated.

Conclusions: In this paper, we use a systematic text mining method to construct a database for the relationship between CNVs and diseases. Based on that, we also developed a concise front-end to facilitate the analysis of CNV/disease association, providing a user-friendly web interface for convenient queries. The resulting system is publically available at http://cnv.gtxlab.com /.

Keywords: Copy number variant (CNV); Disease; Named entities recognition; Parallel computing; Relation extraction.

MeSH terms

  • Computational Biology / methods*
  • DNA Copy Number Variations*
  • Data Mining / methods*
  • Databases, Factual*
  • Disease / genetics*
  • Genetic Testing
  • Human Genetics*
  • Humans