Semi-supervised distributed representations of documents for sentiment analysis

Saerom Park; Jaewook Lee; Kyoungok Kim

doi:10.1016/j.neunet.2019.08.001

Semi-supervised distributed representations of documents for sentiment analysis

Neural Netw. 2019 Nov:119:139-150. doi: 10.1016/j.neunet.2019.08.001. Epub 2019 Aug 6.

Authors

Saerom Park¹, Jaewook Lee², Kyoungok Kim³

Affiliations

¹ Department of Convergence Security Engineering, Sungshin University, 2 Bomunro, 34Da-Gil, Seongbuk-gu, Seoul, 02844, Republic of Korea.
² Industrial Engineering, Seoul National University, 1 Gwanakro, Gwanak-gu, Seoul, 08826, Republic of Korea.
³ Information Technology Management Programme, International Fusion School, Seoul National University of Science & Technology (SeoulTech), 232 Gongreungno, Nowon-gu, Seoul, 01811, Republic of Korea. Electronic address: drsaerompark@gmail.com.

PMID: 31425854
DOI: 10.1016/j.neunet.2019.08.001

Abstract

Learning document representation is important in applying machine learning algorithms for sentiment analysis. Distributed representation learning models of words and documents, one of neural language models, have overcome some limits of vector space models such as bag-of-words model and have been utilized successively in many natural language processing tasks including sentiment analysis. However, because such models learn the embeddings only with a context-based objective, it is hard for embeddings to reflect the sentiment of texts. In this research, we address this problem by introducing a semi-supervised sentiment-discriminative objective using partial sentiment information of documents. Our method not only reflects the partial sentiment information, but also preserves local structures induced from original distributed representation learning objectives by considering only sentiment relationships between neighboring documents. Using real-world datasets, the proposed method has been validated by sentiment visualization and classification tasks. The visualization results of Amazon review datasets demonstrate the enhancement of the sentiment class separation when document representations of our proposed method are compared to other methods. Sentiment prediction from our representations also appears to be consistently superior to other representations in both Amazon and Yelp datasets. This work can be extended to develop effective document embeddings applied to other discriminative tasks.

Keywords: Discriminative learning; Distributed representation; Natural language processing; Neural probabilistic language model; Semi-supervised representation learning; Sentiment analysis.

MeSH terms

Algorithms
Humans
Language
Machine Learning*
Natural Language Processing*