A novel method for linguistic steganography by English translation using attention mechanism and probability distribution theory

YiQing Lin; ZhongHua Wang

doi:10.1371/journal.pone.0295207

A novel method for linguistic steganography by English translation using attention mechanism and probability distribution theory

PLoS One. 2024 Jan 2;19(1):e0295207. doi: 10.1371/journal.pone.0295207. eCollection 2024.

Authors

YiQing Lin¹, ZhongHua Wang²

Affiliations

¹ School of Foreign Languages, Xi'an Shiyou University, Xi'an, China.
² Xi'an Aeronautics Computing Technique Research Institute, AVIC, Xi'an, China.

Abstract

To enhance our ability to model long-range semantical dependencies, we introduce a novel approach for linguistic steganography through English translation. This method leverages attention mechanisms and probability distribution theory, known as NMT-stega (Neural Machine Translation-steganography). Specifically, to optimize translation accuracy and make full use of valuable source text information, we employ an attention-based NMT model as our translation technique. To address potential issues related to the degradation of text quality due to secret information embedding, we have devised a dynamic word pick policy based on probability variance. This policy adaptively constructs an alternative set and dynamically adjusts embedding capacity at each time step, guided by variance thresholds. Additionally, we have incorporated prior knowledge into the model by introducing a hyper-parameter that balances the contributions of the source and target text when predicting the embedded words. Extensive ablation experiments and comparative analyses, conducted on a large-scale Chinese-English corpus, validate the effectiveness of the proposed method across several critical aspects, including embedding rate, text quality, anti-steganography, and semantical distance. Notably, our numerical results demonstrate that the NMT-stega method outperforms alternative approaches in anti-steganography tasks, achieving the highest scores in two steganalysis models, NFZ-WDA (with score of 53) and LS-CNN (with score of 56.4). This underscores the superiority of NMT-stega in the anti-steganography attack task. Furthermore, even when generating longer sentences, with average lengths reaching 47 words, our method maintains strong semantical relationships, as evidenced by a semantic distance of 87.916. Moreover, we evaluate the proposed method using two metrics, Bilingual Evaluation Understudy and Perplexity, and achieve impressive scores of 42.103 and 23.592, respectively, highlighting its exceptional performance in the machine translation task.

Copyright: © 2024 Lin, Wang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Grants and funding

The General Research Project of Higher Education Teaching Reform under Grant SJGY20220659. The funders play a crucial role in data collection and analysis.