A New Weighted Imputed Neighborhood-Regularized Tri-Factorization One-Class Collaborative Filtering Algorithm: Application to Target Gene Prediction of Transcription Factors

IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):126-137. doi: 10.1109/TCBB.2020.2968442. Epub 2021 Feb 3.

Abstract

Identifying target genes of transcription factors (TFs) is crucial to understand transcriptional regulation. However, our understanding of genome-wide TF targeting profile is limited due to the cost of large-scale experiments and intrinsic complexity of gene regulation. Thus, computational prediction methods are useful to predict unobserved TF-gene associations. Here, we develop a new Weighted Imputed Neighborhood-regularized Tri-Factorization one-class collaborative filtering algorithm, WINTF. It predicts unobserved target genes for TFs using known but noisy, incomplete, and biased TF-gene associations and protein-protein interaction networks. Our benchmark study shows that WINTF significantly outperforms its counterpart matrix factorization-based algorithms and tri-factorization methods that do not include weight, imputation, and neighbor-regularization, for TF-gene association prediction. When evaluated by independent datasets, accuracy is 37.8 percent on the top 495 predicted associations, an enrichment factor of 4.19 compared with random guess. Furthermore, many predicted novel associations are supported by literature evidence. Although we only use canonical TF-gene interaction data, WINTF can directly be applied to tissue-specific data when available. Thus, WINTF provides a potentially useful framework to integrate multiple omics data for further improvement of TF-gene prediction and applications to other sparse and noisy biological data. The benchmark dataset and source code are freely available at https://github.com/XieResearchGroup/WINTF.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology / methods*
  • Gene Expression Regulation / genetics
  • Humans
  • Mice
  • Transcription Factors* / classification
  • Transcription Factors* / genetics
  • Transcription Factors* / metabolism
  • Transcriptome / genetics

Substances

  • Transcription Factors