Multimodal representation learning for tourism recommendation with two-tower architecture

Yuhang Cui; Shengbin Liang; YuYing Zhang

doi:10.1371/journal.pone.0299370

Multimodal representation learning for tourism recommendation with two-tower architecture

PLoS One. 2024 Feb 23;19(2):e0299370. doi: 10.1371/journal.pone.0299370. eCollection 2024.

Authors

Yuhang Cui¹, Shengbin Liang^{1

2}, YuYing Zhang¹

Affiliations

¹ School of Software, Henan University, Kaifeng, China.
² Institute for Data Engineering and Science, University of Saint Joseph, Macau, China.

Abstract

Personalized recommendation plays an important role in many online service fields. In the field of tourism recommendation, tourist attractions contain rich context and content information. These implicit features include not only text, but also images and videos. In order to make better use of these features, researchers usually introduce richer feature information or more efficient feature representation methods, but the unrestricted introduction of a large amount of feature information will undoubtedly reduce the performance of the recommendation system. We propose a novel heterogeneous multimodal representation learning method for tourism recommendation. The proposed model is based on two-tower architecture, in which the item tower handles multimodal latent features: Bidirectional Long Short-Term Memory (Bi-LSTM) is used to extract the text features of items, and an External Attention Transformer (EANet) is used to extract image features of items, and connect these feature vectors with item IDs to enrich the feature representation of items. In order to increase the expressiveness of the model, we introduce a deep fully connected stack layer to fuse multimodal feature vectors and capture the hidden relationship between them. The model is tested on the three different datasets, our model is better than the baseline models in NDCG and precision.

Copyright: © 2024 Cui et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Electric Power Supplies
Humans
Learning*
Memory, Long-Term
Research Personnel
Tourism*

Grants and funding

The work was supported by the FDCT Funding Scheme for Postdoctoral Researchers of Higher Education Institutions, grant number 0003/2021/APD, under the supervision of Prof. Shengbin Liang.