A data and text mining pipeline to annotate human mitochondrial variants with functional and clinical information

Mol Genet Genomic Med. 2020 Feb;8(2):e1085. doi: 10.1002/mgg3.1085. Epub 2019 Dec 10.

Abstract

Background: Human mitochondrial DNA has an important role in the cellular energy production through oxidative phosphorylation. Therefore, this process may be the cause and have an effect on mitochondrial DNA mutability, functional alteration, and disease onset related to a wide range of different clinical expressions and phenotypes. Although a large part of the observed variations is fixed in a population and hence expected to be benign, the estimation of the degree of the pathogenicity of any possible human mitochondrial DNA variant is clinically pivotal.

Methods: In this scenario, the establishment of standard criteria based on functional studies is required. In this context, a "data and text mining" pipeline is proposed here, developed using the programming language R, capable of extracting information regarding mitochondrial DNA functional studies and related clinical assessments from the literature, thus improving the annotation of human mitochondrial variants reported in the HmtVar database.

Results: The data mining pipeline has produced a list of 1,073 Pubmed IDs (PMIDs) from which the text mining pipeline has retrieved information on 932 human mitochondrial variants regarding experimental validation and clinical features.

Conclusions: The application of the pipeline will contribute to supporting the interpretation of pathogenicity of human mitochondrial variants by facilitating diagnosis to clinicians and researchers faced with this task.

Keywords: annotation; mitochondria; pathogenicity; variant.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • DNA, Mitochondrial / genetics*
  • Data Mining / methods*
  • Humans
  • Mitochondrial Diseases / genetics*
  • Mitochondrial Diseases / pathology
  • Molecular Sequence Annotation / methods*
  • Polymorphism, Genetic*
  • Software

Substances

  • DNA, Mitochondrial