A systematic review on the state-of-the-art strategies for protein representation

Comput Biol Med. 2023 Jan:152:106440. doi: 10.1016/j.compbiomed.2022.106440. Epub 2022 Dec 17.

Abstract

The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.

Keywords: Drug research; Machine learning; Protein representation methods; Sequence-based descriptors; Structure-based descriptors.

Publication types

  • Systematic Review
  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Information Storage and Retrieval
  • Machine Learning*
  • Proteins*

Substances

  • Proteins