Hierarchical gated recurrent neural network with adversarial and virtual adversarial training on text classification

Hoon-Keng Poon; Wun-She Yap; Yee-Kai Tee; Wai-Kong Lee; Bok-Min Goi

doi:10.1016/j.neunet.2019.08.017

Hierarchical gated recurrent neural network with adversarial and virtual adversarial training on text classification

Neural Netw. 2019 Nov:119:299-312. doi: 10.1016/j.neunet.2019.08.017. Epub 2019 Sep 2.

Authors

Hoon-Keng Poon¹, Wun-She Yap², Yee-Kai Tee¹, Wai-Kong Lee³, Bok-Min Goi¹

Affiliations

¹ Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, Malaysia.
² Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, Malaysia. Electronic address: yapws@utar.edu.my.
³ Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Malaysia.

PMID: 31499354
DOI: 10.1016/j.neunet.2019.08.017

Abstract

Document classification aims to assign one or more classes to a document for ease of management by understanding the content of a document. Hierarchical attention network (HAN) has been showed effective to classify documents that are ambiguous. HAN parses information-intense documents into slices (i.e., words and sentences) such that each slice can be learned separately and in parallel before assigning the classes. However, introducing hierarchical attention approach leads to the redundancy of training parameters which is prone to overfitting. To mitigate the concern of overfitting, we propose a variant of hierarchical attention network using adversarial and virtual adversarial perturbations in 1) word representation, 2) sentence representation and 3) both word and sentence representations. The proposed variant is tested on eight publicly available datasets. The results show that the proposed variant outperforms the hierarchical attention network with and without using random perturbation. More importantly, the proposed variant achieves state-of-the-art performance on multiple benchmark datasets. Visualizations and analysis are provided to show that perturbation can effectively alleviate the overfitting issue and improve the performance of hierarchical attention network.

Keywords: Adversarial training; Machine learning; Neural network; Small-scale datasets; Text classification.

MeSH terms

Algorithms
Humans
Machine Learning*
Neural Networks, Computer*
Software