In my last post about Text Analytics, I described the more classification and concept-oriented pieces of Text Analytics. In this post, I’m going to outline the pieces that most people think of when they think about Text Analytics: Entity Extraction and Sentiment Analysis.
Let’s start with the piece that Lexalytics is best known for - Sentiment. The measurement of sentiment in content seems to be all the rage these days, but in spite of this, very few of our prospects really understand what the technology will and won’t do for them. So, I’ll start at the beginning.
What does it mean to measure sentiment?
That depends entirely on the intentions of the user and the content being measured. If you’re looking at review data (let’s say hotel reviews in this case), then you’re probably thinking about the overall sentiment of each review for the hotel being discussed, which would be an example of document sentiment. If, however, you’re reading a publication like Consumer Reports, then you’re probably thinking more about how the different hotels stack up against one another. In this case, the overall document sentiment is pretty much useless. The document will have some good and some bad content. In fact, what the reader cares about in this kind of content is the tone for each specific hotel that’s being described in a sub-section of the document. This is known as entity-level sentiment. Lexalytics’ sentiment analysis can provide both document and entity-level sentiment, so you’re covered in either scenario.
What really matters in sentiment analysis?
The overall accuracy within the application is important. An example where a technology-based solution really shines is in financial services where the trends across a collection of stories are what users are most interested in. Financial Services is definitely one of the up and coming industrial uses of sentiment because the technology tends to perform better than humans in processing collections of content. Also, Reputation Management is another industry where automated sentiment analysis shines bright. It could be said that automated sentiment analysis was born in this space, and was invented because of the amount of time people spent hand measuring the tone around products and brands. While Reputation Management is currently the biggest market for the technology, it’s probably not the best example of accuracy. It’s hard enough to get humans to agree with humans on the tone for a specific story, but to get people to agree with a computer is even harder. I bring up these two contrasting uses because it’s important for people to think about their specific needs and requirements before they jump into using any vendor’s solution. Make sure the solution you’re looking at is well-suited for the problem you’re trying to solve.
What is entity extraction?
While sentiment scoring is the “hot” topic in our space these days, entity extraction is sort of the meat and potatoes feature that every text analytics vendor needs to provide. Entity extraction is simply the process of extracting well understood types of proper nouns (People, Companies, Places,for example) from a block of text and labeling them with their appropriate type (John Smith as a person, for example). What makes this topic more interesting these days is that a number of vendors, Lexalytics included, have significantly improved their entity recognition technology in recent months to utilize techniques like “grammatical parsing” and “Max Ent” models to do a better job of extracting entities. I did a complete post a little over a month ago about our new Entity Management Toolkitwhich explains how users can now build their own entity recognizers. We aren’t the only ones pushing hard on entity extraction, other companies are working on this as well. Especially on grammatic parsing using anaphora resolution where “John Smith” and “He” are recognized as the same entity. I hope this quick overview provides you with a bit of a background on the basic technology and uses for Text Analytics. I will, from time to time, write a new post on some of the up and coming additions to the space, like relationship extraction, fact extraction and short document (think Twitter) processing.