Multi-Instance Metric Transfer Learning for Genome-Wide Protein Function Prediction

Yonghui Xu; Huaqing Min; Qingyao Wu; Hengjie Song; Bicui Ye

doi:10.1038/srep41831

Multi-Instance Metric Transfer Learning for Genome-Wide Protein Function Prediction

Sci Rep. 2017 Feb 6:7:41831. doi: 10.1038/srep41831.

Authors

Yonghui Xu¹, Huaqing Min², Qingyao Wu^{2

3}, Hengjie Song², Bicui Ye⁴

Affiliations

¹ School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China.
² School of Software Engineering, South China University of Technology, Guangzhou, 510006, China.
³ State Key Laboratory for Novel Software Technology, Nanjing University, China.
⁴ Wuzhou Red Cross Hospital, Wuzhou, 543002, China.

Abstract

Multi-Instance (MI) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with multiple instances. Many studies in this literature attempted to find an appropriate Multi-Instance Learning (MIL) method for genome-wide protein function prediction under a usual assumption, the underlying distribution from testing data (target domain, i.e., TD) is the same as that from training data (source domain, i.e., SD). However, this assumption may be violated in real practice. To tackle this problem, in this paper, we propose a Multi-Instance Metric Transfer Learning (MIMTL) approach for genome-wide protein function prediction. In MIMTL, we first transfer the source domain distribution to the target domain distribution by utilizing the bag weights. Then, we construct a distance metric learning method with the reweighted bags. At last, we develop an alternative optimization scheme for MIMTL. Comprehensive experimental evidence on seven real-world organisms verifies the effectiveness and efficiency of the proposed MIMTL approach over several state-of-the-art methods.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Genome*
Genome-Wide Association Study*
Genomics* / methods
Machine Learning*
Proteins / genetics*

Substances

Proteins