This will be the final piece on the basics of Text Analytics. I’ve covered the basics of categorization/classification, sentiment analysis and finally I’ll spend some time on entity extraction.
As I posted in Part 2, entity extraction is simply the process of extracting well understood types of proper nouns (People, Companies, Places,for example) from a block of text and labeling them with their appropriate type (John Smith as a person, for example). But what makes entity extraction useful is not necessarily the “what” that is extracted, but the way you then take that information and work to create new libraries of information, or how you append the information to create a better search solution or application. So, think of it this way. Within a document is a bunch of text that can be parsed based on grammar so you have nouns, verbs, adjectives, pronouns, etc.
Without going too deep into the process of parsing out this information, text analytics is able to identify pieces of the text that you may not have known existed within the documents (or blogs or whatever your source may be). By recognizing people, companies, places and even themes, you are able to find the value within the information without having to know what you were looking for in the first place.
We are working with The Financial Times Group on their Newssift site, which is the PERFECT example of how entity extraction can compliment a service or application. We provide them the ability for thematic extraction based on the corpus of data they have flowing into their system. So, in their case, when you start with a simple search based on keywords, you get a certain number of results for those keywords. What you also get is suggestions of ways to dig deeper into the content based on themes and other extracted information.
So the idea that text analytics can pull out that information is nice, but what you do with that information is what makes it really valuable. In Newssift’s case, they make a news site even more useful by offering up suggestions beyond the original search criteria. There is the power. As entity extraction techniques mature and improve, we should expect to see more creative and unique ways to analyze and process the data. Micro-blogging and messaging systems are changing the way we think about text and that will prove to be an influential factor in the text analytics space.