Entity extraction in text analytics is the basis of the entire process - identify people, places, companies and themes and use them to better understand the content.
There are two types of entity recognizers that we have used in Salience 4 with much success:
Model based entities: this process looks at language and determines factors like parts of speech and extracts the relevant entity.
Customer driven lists: this process looks at a list provided by a customer to match and extract the relevant entity.
Our soon to be released Salience 4.3 is introducing query-based entities. This process takes into account the combination of words to make a match and extracts the entity based on that query.
For example, there is more than one Senator Udall in the United States. Mark Udall is a Senator from Colorado. Tom Udall is a Senator from New Mexico. If you had a query-based entity recognizer for "Tom Udall" you would create a query that includes the terms "Senator Udall" and "New Mexico" to determine that it must be Senator Tom Udall and not Senator Mark Udall.
While this is often compared with confidence-based entities, this isn't based on a confidence about the language, but an absolute.