iTTCA-MVL: A multi-view learning model based on physicochemical information and sequence statistical information for tumor T cell antigens identification

Comput Biol Med. 2024 Mar:170:107941. doi: 10.1016/j.compbiomed.2024.107941. Epub 2024 Jan 1.

Abstract

Immunotherapy is an emerging treatment method aimed at activating the human immune system and relying on its own immune function to kill cancer cells and tumor tissues. It has the advantages of wide applicability and minimal side effects. Effective identification of tumor T cell antigens (TTCAs) will help researchers understand their functions and mechanisms and carry out research on anti-tumor vaccine development. Considering that using biological experimental technology to identify TTCAs can be costly and time-consuming, it is necessary to develop a robust bioinformatics computing tool. At present, different machine learning models have been proposed for identifying TTCAs, but there is still room for further improvement in their performance. To establish a TTCA predictor with better prediction performance, we propose a prediction model called iTTCA-MVL in this paper. We extracted three sets of features from the views of physicochemical information and sequence statistics, namely the distribution descriptor of composition, transition, and distribution (CTDD), TF-IDF, and LSA topic. Then, we used least squares support vector machines (LSSVMs) as submodels and Hilbert‒Schmidt independence criteria (HSIC) as constraints to establish an independent and complementary multi-view learning model. The prediction accuracy of iTTCA-MVL on the independent test set is 0.873, and Matthew's correlation coefficient is 0.747, which is significantly better than those of existing methods. Therefore, iTTCA-MVL is an excellent prediction tool that researchers can use to accurately identify TTCAs.

Keywords: Hilbert‒Schmidt independence criteria; Multi-view learning; Sequence statistics; TF-IDF; Tumor T-cell antigen.

MeSH terms

  • Computational Biology* / methods
  • Humans
  • Machine Learning*
  • T-Lymphocytes