Sentiment is one of the cornerstones of the Lexalytics business, its what brings us to the dance if you will. To this end I’m always interested in ways that we can improve the perceived accuracy and ease of use of the system and in the forthcoming 4.1 release of Salience we are introducing a couple of new things that help with this.
Since the very early days we’ve had the ability for customers to add their own sentiment phrases so that they can customise the results to match what they would expect, but as customers have become increasingly au fait with the idea of machine generated sentiment, so they have asked for the ability to capture more complex concepts than a simple phrase can provide. To this end 4.1 supports the idea of Sentiment Concepts which are basically a way of defining a concept via a search and applying a sentiment score to the document if that search is satisfied. This enables you to do 2 things
- Bias a document score based on the type of document it is. For example reduce the scores of Press Releases as even describing bad news is always done in glowingly positive terms
- Capture a concept such as the phone never being answered by using a simple query such as (phone NEAR "not answered") OR (phone NEAR "rang off")
This fits in with our existing model very well but it requires you to have a pretty intimate knowledge of the data and concepts included in it.
So the second new feature we are introducing is the idea of a model based sentiment system. This basically means you can train up a system based on documents that you have hand classified and use that to generate machine based sentiment going forward.
This of course is a pretty common sentiment technique and is not without its drawbacks (it requires you to hand score documents) but if you have those already then it does enable you to get up and running pretty quickly with sentiment that is focused around your specific domain and the types of documents within it.
For the 4.1 release we are going to be marking this as Experimental as I’m still playing around with the underlying model (which is Max Ent based) that we are using to determine the best feature sets etc. but it is something that we are going to carry on with going forward. I also intend to extend the technique to allowing you to do model based entity sentiment as well, but that’s further off.
Well that’s probably enough for this post, if you’ve got any observations on the directions that we are going or requests for new functionality then feel free to leave a comment.