Dig Out Relevant Text Elements with Entity Extraction API

text-mining-1476780_1280
Named Entity Recognition, also known as entity extraction classifies named entities that are present in a text into pre-defined categories. These categories can be individuals, companies, places, organization, cities and others. Entity extraction is a subtask of information extraction. It is one of the basic starting points for using natural language processing techniques to augment your content. Extracting key entities such as person names, locations, dates, specialized terms and product terminology from untreated text can sanction organizations to not only improve keyword search but also paves the path for semantic search, targeted search and document repurposing. Entity extraction can add a wealth of semantic knowledge to your content. This helps you to promptly understand the subject of any given text.
Our Named Entity Recognition API uses Deep Learning technology to determine representations of character groupings. With an immaculate accuracy, our API discovers the most relevant entities in your textual content. Try our Entity Extraction demo.

How our entity extraction API works

Our API uses deep learning technology. Below, you can find a brief description of our technology:

  • Word Embeddings are trained on a huge text corpus our extensive crawling infrastructure collects from the open web. These embeddings are trained using either GloVe or Word2Vec algorithm. We use gloVe embeddings in production. This algorithm converts each word into a dense 100-dimensional vector. The Neural Network we train takes these Embeddings as inputs instead of words directly.
  • Our internal data tagging team annotated a huge dataset of entities present in the data we have crawled. So for example, the sentence “This is a house that Jack built” is annotated with (Jack, Person) and “Ram and Shyam are going to Delhi” is annotated with (Ram, Person), (Shyam, Person) and (Delhi, Place). Our internal dataset has over 200,000 such annotated sentences.
  • We then train a sequence labeling bidirectional LSTM on top of the tagged dataset mentioned above to predict whether each word in a sentence in an entity or not. An LSTM or Long Short Memory Network is a better RNN, which avoids gradient damping by converting general recurrence’s multiplication paradigm into addition paradigm.
  • Attention layer was also tried in LSTM to see if it can help tell about important properties in a sentence which define a word as an entity. We are still refining the model with attention and the model in production is LSTM without attention.

Out of the total data given as input, 10% was used for testing the system and the remaining for training it. Our Neural Network model attains over 90% accuracy in extracting entities.
For a better understanding of how the entities are extracted from a piece of text, here is an example:

Example

Input

In 2015 Harry Styles tweeted nonchalantly about Monopoly and we noticed the RRP of One Direction’s official Monopoly game sky-rocket by 125%.
Forbes estimates that Kim Kardashian West has made $51m from her enormous social media following through sponsorship deals. She is quoted: “There’s a lot of value in social media, and people really get that.” We tend to agree, Kim. The “all publicity is good publicity” mantra probably doesn’t work when the President-Elect of the United States is tweeting about canceling an order from your company worth millions of dollars.
We’re living in unprecedented times with an unprecedented President-Elect.

Output

If you deal with a massive corpus on a daily basis, entity extraction can work wonders for you. There can be several ways in which entity extraction sorts out most of your content-related issues.

  • Automatically generated metadata for your content can be used to improve SEO.
  • Identify the trends associated with your brand, product or service and group them by a person, place or location. Hence, improve your overall social listening.
  • Extract key entities in user queries like product name, service request etc. to analyze most frequently used terms. This is called intent analysis.

The most significant use of entity extraction is leveraged by publishing organizations. The Media industry is switching fast to semantic publishing. Know more about semantic publishing here.

Feedback

Tell us what you think of our Entity Extraction API. We would love to have your feedback.
Leave a comment and share your thoughts.


ParallelDots AI APIs , are a Deep Learning powered web service by ParallelDots Inc, that can comprehend a huge amount of unstructured text and visual content to empower your products. You can check out some of our text analysis APIs and reach out to us by filling this form here or write to us at apis@paralleldots.com

Leave a Reply