Semi-supervised distributed representations of documents for sentiment analysis

Neural Netw. 2019 Nov:119:139-150. doi: 10.1016/j.neunet.2019.08.001. Epub 2019 Aug 6.

Abstract

Learning document representation is important in applying machine learning algorithms for sentiment analysis. Distributed representation learning models of words and documents, one of neural language models, have overcome some limits of vector space models such as bag-of-words model and have been utilized successively in many natural language processing tasks including sentiment analysis. However, because such models learn the embeddings only with a context-based objective, it is hard for embeddings to reflect the sentiment of texts. In this research, we address this problem by introducing a semi-supervised sentiment-discriminative objective using partial sentiment information of documents. Our method not only reflects the partial sentiment information, but also preserves local structures induced from original distributed representation learning objectives by considering only sentiment relationships between neighboring documents. Using real-world datasets, the proposed method has been validated by sentiment visualization and classification tasks. The visualization results of Amazon review datasets demonstrate the enhancement of the sentiment class separation when document representations of our proposed method are compared to other methods. Sentiment prediction from our representations also appears to be consistently superior to other representations in both Amazon and Yelp datasets. This work can be extended to develop effective document embeddings applied to other discriminative tasks.

Keywords: Discriminative learning; Distributed representation; Natural language processing; Neural probabilistic language model; Semi-supervised representation learning; Sentiment analysis.

MeSH terms

  • Algorithms
  • Humans
  • Language
  • Machine Learning*
  • Natural Language Processing*