Salience 5.2 Walkthrough: Entity Extraction

One of Salience’s many text analysis capabilities is Named Entity Extraction. Named entities are companies, people, places, products, dates, URLs, hashtags, @mentions, phone numbers, currency amounts, and more – in fact, the great thing about named entities is that they can be whatever you need them to be.

Named entities are useful because of the immediate associations you can make with them. Named entities can be associated with sentiment, themes, topics, and summaries. They’re the “who” and “where” of your content, and attached to them are the the “when” and “why”.
To demonstrate, I pulled the text from a random BBC News article titled  “China suspends McDonald’s and KFC’s meat supplier”, and without reading it at all, plugged it into the Salience demo app (provided with the install – you can do something similar using our online demo).

Right out of the box, the Salience demo presents me with several types of named entity extracted from the article: Companies, People, Places, and Quotes. Moreover, Salience also displays the sentiment score associated with each entity. This means that I can quickly sort the list of entities by their type (Label), sentiment, or even by how many times they appear in the text (Count).
Here I see a reference to “OSI Group” and want to know more: who is OSI Group? How are they involved in this story? To find out, I double-click on the OSI Group entity.  This shows me a more detailed results for the entity: an entity-specific summary, themes, relevant sentiment phrases, and the associated topics (categories or tags, depending on what you want to call them).

First, Salience gives me a summary of OSI Group’s mentions in the text. This is incredibly useful – right away, I know:

  • OSI Group is a United States-based food supplier
  • They began working with McDonald’s China in 1992
  • OSI is under fire for “alleged use of expired raw food material production and the processing of it in food”

But that’s not all that Salience tells me about OSI Group.

The Themes tab displays what themes an entity is associated with; in OSI Group’s case, it is strongly associated with the “raw food material production” and “US-based food supplier” themes.  Themes are contextually important noun phrases – when taken across a corpus of content, you can get a feel for where the “buzz” is.  They are scored for their importance in the text, so you can see what themes are most relevant.

Finally, the topics tab shows me the different topics within the document and how the entity is perceived in regards to each topic. Topics are pre-configured categories, so you can see the context in which the entity is mentioned.

In this case, the news article’s short length means that OSI Group is only mentioned in regards to one topic, health, and the article’s sentiment when referring to OSI Group in the context of health is neutral.  A user could easily configure more topics if they wanted to get more specific analysis.

For this example I used zero customization. This is pure out-of-the-box functionality, using just the basic features found in our demo version.
Try it out for yourself – to make it even easier, you can get results delivered via the Salience API or Semantria SaaS service, or via Microsoft Excel plugin.