Sentiment analysis is like a gateway to AI based text analysis. For any company or data scientist looking to extract meaning out of an unstructured text corpus, sentiment analysis is one of the first steps which gives a high RoI of additional insights with relatively low investment of time and efforts. With an explosion of text data available in digital formats, the need for sentiment analysis and other NLU techniques for analysing this data is growing rapidly. Sentiment Analysis looks relatively simple and works very well today, but we have reached here after significant efforts by researchers who have invented different approaches and tried numerous models.
In the chart above, we give a snapshot to the reader about the different approaches tried and their corresponding accuracy on the IMDB dataset. However, we highlight that the IMDB dataset is relatively small and this makes other methods seem competitive compared to Neural Networks. When doing this analysis on a larger dataset (like the Yelp Dataset with 5 output classes and 4 million+ reviews), the rankings change. For example, Very Deep CNNs deliver up-to 64.1% accuracy on the Yelp Dataset compared to 63% for FastText.
When we study sentiment analysis today, we have the advantage of standing on the shoulders of giants. With this blog post, we are adding our two cents by giving the readers an overview of how this text classification technique works under the hood. If you feel we have missed anything, we will eat our humble pie and edit the list to make it more complete. Read on and do let us know your thoughts in the comments.
RNN Based Models
Recurrent Neural Networks were developed in the 1980s. A lot of algorithms we’re going to discuss in this piece are based on RNNs. RNNs recursively apply the same function (the function it learns during training) on a combination of previous memory (called hidden unit gathered from time 0 through t-1) and new input (at time t) to get output at time t. General RNNs have problems like gradients becoming too large and too small when you try to train a sentiment model using them due to the recursive nature.
Long Short-Term Memory (LSTM) is gradient-based learning algorithm which can learn to bridge time intervals even in case of noisy, incompressible input sequences, without loss of short time lag capabilities. This is achieved by gradient-based algorithm which enforces a constant error flow through internal states of special LSTM units. The algorithm was presented by Sepp Hochrieter et al. in 1997. Essentially, the recursive application of function in RNN is controlled by converting it into an addition process.
Gated Feedback Recurrent Neural Network extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input.
Both LSTM and GF-RNN weren’t written specifically focusing on sentiment analysis, but a lot of sentiment analysis models are based on these two highly cited papers.
Recursive Neural Tensor Network
RNTN was introduced in 2011-2012 by Richard Socher et al. from Standford’s NLP group. The authors introduced the Recursive Neural Tensor Network which was trained on a different kind of dataset, called the Standford Sentiment Treebank. The Stanford Sentiment Treebank was the first dataset with fully labelled parse trees that allows for a complete analysis of the compositional effects of sentiment and allows to analyze the intricacies of sentiment and to capture complex linguistic phenomena.
The model outperformed all previous methods on several metrics and pushed the state-of-the-art in single sentence positive/negative classification from 80% up to 85.4%. A more advanced version of this algorithm called Tree LSTM was proposed by Christopher Manning et al. in 2015. The Tree-LSTM model is a generalization of LSTMs to tree-structured network topologies. TreeLSTMs outperform all existing systems and strong LSTM baselines on sentiment classification on Stanford Sentiment Treebank dataset.
Attention-Based Neural Networks
Attention as a concept has given a lot of success in sequence modeling tasks, where input/output both are a sequence (like translation). Attention Mechanism can be viewed as a method for making the RNN work better by letting the network know where to look as it is performing its task. For a long time, attention based models did not fit for analyzing sentiment and related tasks, where the output is one attribute (+/-/neutral) for the entire sentence.
The invention of Self-attention has given a superb performance in certain NLP tasks, such as summarization and many other fields which were even deemed unsolvable.
This paper by Zhouhan Lin et al. proposed a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, the authors used a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. Self Attention is weighted information spread across different part of a sentence according to the global meaning.
The proposed sentence embedding model consists of two parts. The first part is a bidirectional LSTM, and the second part is the self-attention mechanism, which provides a set of summation weight vectors for the LSTM hidden states. These set of summation weight vectors are dotted with the LSTM hidden states, and the resulting weighted LSTM hidden states are considered as an embedding for the sentence.
Attention Is All You Need, a paper published by Ashish Vaswani et al in Jun 2017, presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. “Attention-only” based networks might finish RNN based networks (RNTN/RNN/LSTM/GRU) altogether for some NLP tasks(such as translation), but we still have to produce a strong case for sentiment analysis using these models.
Multi-Task Learning Based Models
Another method that can give major success is Multi-Task Learning (MTL). MTL is a sub-field of machine learning in which multiple learning tasks are solved at the same time while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models when compared to training the models separately.
At ParallelDots, we have an MTL dataset tagged by our own tagging team and our new sentiment model (launching soon) is an MTL model with Self Attention.