Announcing ParallelDots’ All New NLP Stack


Our journey at a glance

At ParallelDots, we envision a future where intelligent machines will assist humans in performing a plethora of tasks which incorporates cognitive understanding. In this itinerary, we are contributing by developing AI-first products for empowering your business. ParallelDots AI APIs are currently used by 1000+ developers across the world to enhance textual and visual cognition of their products and services.

We had introduced our first set of APIs in mid-2015 after we pivoted from our News Recommendation Engine business model to AI APIs as a service business. We have been developing and deploying multiple Deep Learning powered APIs since then. Currently, our NLP stack has more than 9 APIs for advanced text analysis along with different AI-first products in multiple sectors such as Healthcare, Market Research and Consumer Applications. Let us walk you through the steps we took in deploying these improved algorithms.

Our New NLP Stack

As Deep Learning has revolutionized NLP over past year, we decided to revamp our algorithms with the latest findings and models. Some of our algorithms were over a year old, hence we decided to incorporate the latest and more accurate Deep Learning models for our APIs.

We started with enhancing our dataset tagging and subsequently training new algorithms on top of these enhanced datasets. Thereafter, we pushed these APIs to production so that our users can incorporate the latest Deep Learning R&D in their technology stack.

Aspects which drove this revamping process

Based on user feedback and our internal analysis, we shaped the revamping process of the APIs according to these aspects:

  • Better performance of Sentiment Analysis algorithm on social media data
  • Better Sentiment Analysis in different languages
  • Faster and better Named Entity Recognition and Keyword Spotting
  • Better and more relevant Taxonomy tags
  • Grammatically more coherent Semantic Similarity
Enhancing our datasets

The very first step was to enhance our datasets and improve our data tagging capacity as a whole. Our in-house data tagging team which worked day and night to get the required datasets, tagged for us.  Our team classified Tweets according to the Sentiment, Intent and Emotion. Short text corpus was tagged by identifying entities as a place, person and organization, and for identifying keywords.

We dedicated 3 months to come with the enhanced and larger data sets which are now continuously improving in numbers and quality. In this hustle of developing AI-first products for past few years, we have observed that to see the magic of Deep Learning NLP, any dataset less than 100,000 data points does not suffice. The entire drive enabled us to build an internal data tagging engine, which is now being used to solve even more complex problems.


Enhancements in NLP Algorithms

Multi-Task Learning (MTL) and Self Attention for Sentiment, Intent and Emotion Classification:

Generally, a machine learning model is trained to learn for one task. MTL on the other hand, the model is trained to optimize for multiple tasks at the same time. To train such a model, we got the same set of text tagged with annotations for different tasks (sentiment of the article, intent of the article, emotion of the article being 3 of the annotations) manually to be used as a training dataset. We use the model to learn a common shared representation (learnt by the shared part of model) to classify text for different tasks (in task specific parts of model). This technique helps in improving the accuracy of models as it learns more salient features of the data cutting the noise.

MTL Architecture. Credits: Sebastian Ruder

We trained our MTL model to predict Sentiment, Intent, and Emotion at the same time on our dataset. We used self-attention on top of LSTM hidden states in the model. Self-attention has recently been used by many researchers to achieve state-of-the-art performance in multiple NLP tasks. Each task used separate attention head followed by two fully connected layers. We used pre-trained word embedding in our model. As we trained the model on the Twitter dataset, there were many words whose word vector was not present in Pre-Trained Embedding. Hence, we used character level LSTM to generate word embedding for unknown or mistyped words.


Stacked Character Wise as well as Wordwise LSTMs with residual Connections for Keyword and NER Spotting:

Extracting word features with word embeddings. Credits: Quan Tran et. al.

LSTM networks are successful feature extractor for the information present in the corpus and to add the representational power to it, stacking of the LSTMs (char and word LSTMs) is a useful approach. On top of this, we also put residual connections between word embeddings and hidden states of LSTMs to avoid losing low level context in such a complex network . This resulted in the significant improvement in the detection of the correct entities in NER model and detection of the accurate keywords in Keyword model.


LSTM Autoencoders for Semantic Similarity

Autoencoding is a data compression algorithm where compression and decompression functions are learned from the training examples implemented using the LSTMs. This unsupervised algorithm not only tries to recreate their inputs, but can also be forced to learn the order and semantics of the sentences using the LSTMs.

The amount of information lost while decoding the original input is back propagated via a loss function. The loss function is the aggregated result of the reconstruction loss and the regularizer which is the Kullback-Leibler Divergence (KLD) between the source distribution and the modeled distribution. It distributes the semantic information accordingly and makes the model learn the semantic similarity of the pair of sentences.


Deepmoji based Sentiment Classification in Spanish and Portuguese

We have used stacked LSTM with skip connections and the attention layer to detect the sentiment of the sentence. We ended up implementing our own version of the famous deepmoji model for this. Though the stacked LSTMs perform well in analyzing the sentiment of a corpus, they are not the best in handling certain edge cases such as where a positive text may serve to disambiguate an ambiguous sentence or to complement an otherwise relatively negative text. 

NLP Stack Spanish
Deepmoji Model. Credits: Bjark Felbo et. al.

The attention layer in the model automatically makes the LSTMs determine what words and phrases are important for sentiment of the text.


Convolutional Neural Network for Sentiment Detection in Chinese:

CNNs were responsible for major breakthroughs in Image Classification and are the core of most Computer Vision systems today. Also, applying CNNs in NLP tasks has performed quite well. The major advantage of using CNNs is to learn good representations automatically without the need to represent the whole vocabulary.

NLP Stack Chinese
The CNN architecture for Chinese Sentiment. Credits: Lei Zhang et. al.

Since Sentiment Analysis is a supervised algorithm, a large amount of data is needed to make it learn better. But in Chinese Sentiment prediction, we had a comparatively small dataset (therefore, even the vocabulary was small). To counter this, we trained our model using CNN layer followed by max-over-time pooling layer on Chinese texts at character level (While hanji words are too many and sparse, the base chracters they are made of are somewhat lesser and manageable, this property helps us train a classifier for Chinese Sentiment).


Our new Zero-Shot Learning algorithm for Taxonomy:

Zero-Shot Learners are algorithms that don’t need to be trained on a new dataset in order to be deployed. We are introducing a brand new technique of text categorization using a RelationNet module. We cleverly used a massive amount of loosely tagged data to train a class unspecific model. This model can predict which class does the sentence belong to from a set of Taxonomy classes that are decided at the run-time post training.

For data preparation, we crawled website headlines and its SEO tags. We made a triplet of the sentence, tag and relatedness (binary number). For half of the sentences, we used a random tag with 0 as a value for relatedness and for the other half, we kept the true tag.

Our model contains an LSTM based RelationNet followed by two fully connected layers. The model checks if a word and a sentence are related or not. We concatenate the word embedding and tag’s embedding to use LSTM as RelationNet. Finally, we use LSTM’s last hidden state for binary predictions.


The Path Forward

We have deployed the new algorithms to production and you can check the interactive plug and play demos here. We are in process of developing and deploying other NLP and Image Processing algorithms for automating and speeding up the tasks which solve real-world problems. We’ll be announcing the launch of new AI-first products, please watch this space for more details. As an applied AI research group, ParallelDots is always looking to contribute towards this era of advanced machines and artificial intelligence.


Leave a Reply