Interactome of the hepatitis C virus: Literature mining with ANDSystem

Virus Res. 2016 Jun 15:218:40-8. doi: 10.1016/j.virusres.2015.12.003. Epub 2015 Dec 7.

Abstract

A study of the molecular genetics mechanisms of host-pathogen interactions is of paramount importance in developing drugs against viral diseases. Currently, the literature contains a huge amount of information that describes interactions between HCV and human proteins. In addition, there are many factual databases that contain experimentally verified data on HCV-host interactions. The sources of such data are the original data along with the data manually extracted from the literature. However, the manual analysis of scientific publications is time consuming and, because of this, databases created with such an approach often do not have complete information. One of the most promising methods to provide actualisation and completeness of information is text mining. Here, with the use of a previously developed method by the authors using ANDSystem, an automated extraction of information on the interactions between HCV and human proteins was conducted. As a data source for the text mining approach, PubMed abstracts and full text articles were used. Additionally, external factual databases were analyzed. On the basis of this analysis, a special version of ANDSystem, extended with the HCV interactome, was created. The HCV interactome contains information about the interactions between 969 human and 11 HCV proteins. Among the 969 proteins, 153 'new' proteins were found not previously referred to in any external databases of protein-protein interactions for HCV-host interactions. Thus, the extended ANDSystem possesses a more comprehensive detailing of HCV-host interactions versus other existing databases. It was interesting that HCV proteins more preferably interact with human proteins that were already involved in a large number of protein-protein interactions as well as those associated with many diseases. Among human proteins of the HCV interactome, there were a large number of proteins regulated by microRNAs. It turned out that the results obtained for protein-protein interactions and microRNA-regulation did not depend on how well the proteins were studied, while protein-disease interactions appeared to be dependent on the level of study. In particular, the mean number of diseases linked to well-studied proteins (proteins were considered well-studied if they were mentioned in 50 or more PubMed publications) from the HCV interactome was 20.8, significantly exceeding the mean number of associations with diseases (10.1) for the total set of well-studied human proteins present in ANDSystem. For proteins not highly poorly-studied investigated, proteins from the HCV interactome (each protein was referred to in less than 50 publications) distribution of the number of diseases associated with them had no statistically significant differences from the distribution of the number of diseases associated with poorly-studied proteins based on the total set of human proteins stored in ANDSystem. With this, the average number of associations with diseases for the HCV interactome and the total set of human proteins were 0.3 and 0.2, respectively. Thus, ANDSystem, extended with the HCV interactome, can be helpful in a wide range of issues related to analyzing HCV-host interactions in the search for anti-HCV drug targets. The demo version of the extended ANDSystem covered here containing only interactions between human proteins, genes, metabolites, diseases, miRNAs and molecular-genetic pathways, as well as interactions between human proteins/genes and HCV proteins, is freely available at the following web address: http://www-bionet.sscc.ru/psd/andhcv/.

Keywords: ANDSystem; HCV interactome; HCV–host interactions; Hepatitis C virus; Protein–protein interactions; Text mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Data Mining / methods*
  • Data Mining / statistics & numerical data
  • Databases, Factual
  • Gene Expression Regulation
  • Hepacivirus / genetics*
  • Hepacivirus / metabolism
  • Hepatitis C / genetics*
  • Hepatitis C / metabolism
  • Hepatitis C / virology
  • Host-Pathogen Interactions
  • Humans
  • MicroRNAs / genetics
  • MicroRNAs / metabolism
  • Multiprotein Complexes / genetics
  • Multiprotein Complexes / metabolism
  • Protein Interaction Mapping
  • PubMed / statistics & numerical data
  • Receptors, Virus / genetics*
  • Receptors, Virus / metabolism
  • Viral Proteins / genetics*
  • Viral Proteins / metabolism

Substances

  • MicroRNAs
  • Multiprotein Complexes
  • Receptors, Virus
  • Viral Proteins