SMONAC: Supervised Multiobjective Negative Actor-Critic for Sequential Recommendation

IEEE Trans Neural Netw Learn Syst. 2023 Oct 3:PP. doi: 10.1109/TNNLS.2023.3317353. Online ahead of print.

Abstract

Recent research shows that the sole accuracy metric may lead to the homogeneous and repetitive recommendations for users and affect the long-term user engagement. Multiobjective reinforcement learning (RL) is a promising method to achieve a good balance in multiple objectives, including accuracy, diversity, and novelty. However, it has two deficiencies: neglecting the updating of negative action Q values and limited regulation from the RL Q-networks to the (self-)supervised learning recommendation network. To address these disadvantages, we develop the supervised multiobjective negative actor-critic (SMONAC) algorithm, which includes a negative action update mechanism and multiobjective actor-critic mechanism. For the negative action update mechanism, several negative actions are randomly sampled during each time updating, and then, the offline RL approach is utilized to learn their Q values. For the multiobjective actor-critic mechanism, accuracy, diversity, and novelty Q values are integrated into the scalarized Q value, which is used to criticize the supervised learning recommendation network. The comparative experiments are conducted on two real-world datasets, and the results demonstrate that the developed SMONAC achieves tremendous performance promotion, especially for the metrics of diversity and novelty.