Deleterious Non-Synonymous Single Nucleotide Polymorphism Predictions on Human Transcription Factors

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):327-333. doi: 10.1109/TCBB.2018.2882548. Epub 2018 Nov 21.

Abstract

Transcription factors (TFs) are the major components of human gene regulation. In particular, they bind onto specific DNA sequences and regulate neighborhood genes in different tissues at different developmental stages. Non-synonymous single nucleotide polymorphisms on its protein-coding sequences could result in undesired consequences in human. Therefore, it is necessary to develop methods for predicting any abnormality among those non-synonymous single nucleotide polymorphisms. To address it, we have developed and compared different strategies to predict deleterious non-synonymous single nucleotide polymorphisms (also known as missense mutations) on the protein-coding sequences of human TFs. Taking advantage of evolutionary conservation signals, we have developed and compared different classifiers with different feature sets as computed from different evolutionarily related sequence collections. The results indicate that the classic ensemble algorithm, Adaboost with decision stumps, with orthologous sequence collection, has performed the best (namely, TFmedic). We have further compared TFmedic with other state-of-the-arts methods (i.e., PolyPhen-2 and SIFT) on PolyPhen-2's own datasets, demonstrating that TFmedic can outperform the others. As applications, we have further applied TFmedic to all possible missense mutations on all human transcription factors; the proteome-wide results reveal interesting insights, consistent with the existing physiochemical knowledge. A case study with the actual 3D structure is conducted, revealing how TFmedic can be contributed to protein-DNA binding complex studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Data Mining
  • Humans
  • Machine Learning*
  • Mutation, Missense / genetics
  • Polymorphism, Single Nucleotide / genetics*
  • Transcription Factors / genetics*

Substances

  • Transcription Factors