Accelerating bioactive peptide discovery via mutual information-based meta-learning

Wenjia He; Yi Jiang; Junru Jin; Zhongshen Li; Jiaojiao Zhao; Balachandran Manavalan; Ran Su; Xin Gao; Leyi Wei

doi:10.1093/bib/bbab499

Accelerating bioactive peptide discovery via mutual information-based meta-learning

Brief Bioinform. 2022 Jan 17;23(1):bbab499. doi: 10.1093/bib/bbab499.

Authors

Wenjia He^{1

2

3}, Yi Jiang^{1

2}, Junru Jin^{1

2}, Zhongshen Li^{1

2}, Jiaojiao Zhao^{1

2}, Balachandran Manavalan⁴, Ran Su⁵, Xin Gao⁶, Leyi Wei^{1

2}

Affiliations

¹ School of Software, Shandong University, Jinan, China.
² Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
³ BioMap, Beijing, China.
⁴ Department of Physiology, Ajou University School of Medicine, Republic of Korea.
⁵ College of Intelligence and Computing, Tianjin University, Tianjin, China.
⁶ King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, 23955-6900, Saudi Arabia.

PMID: 34882225
DOI: 10.1093/bib/bbab499

Abstract

Recently, machine learning methods have been developed to identify various peptide bio-activities. However, due to the lack of experimentally validated peptides, machine learning methods cannot provide a sufficiently trained model, easily resulting in poor generalizability. Furthermore, there is no generic computational framework to predict the bioactivities of different peptides. Thus, a natural question is whether we can use limited samples to build an effective predictive model for different kinds of peptides. To address this question, we propose Mutual Information Maximization Meta-Learning (MIMML), a novel meta-learning-based predictive model for bioactive peptide discovery. Using few samples from various functional peptides, MIMML can sufficiently learn the discriminative information amongst various functions and characterize functional differences. Experimental results show excellent performance of MIMML though using far fewer training samples as compared to the state-of-the-art methods. We also decipher the latent relationships among different kinds of functions to understand what meta-model learned to improve a specific task. In summary, this study is a pioneering work in the field of functional peptide mining and provides the first-of-its-kind solution for few-sample learning problems in biological sequence analysis, accelerating the new functional peptide discovery. The source codes and datasets are available on https://github.com/TearsWaiting/MIMML.

Keywords: few-shot learning; meta-learning; mutual information; peptide discovery; sequence analysis.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Machine Learning*
Peptides* / chemistry
Software

Substances

Peptides