LexaBlog: Our Sentiment about Text Analytics and Social Media
Automated doesn't always mean perfect but it doesn't always mean wrong either.
I am fascinated by the nay-sayers that repeatedly claim automated sentiment software, web sites and services are garbage. According to them, the software almost always get it wrong; it doesn't interpret sarcasm; it's not 100% accurate. The rule of thought is that you can't extract true sentiment unless you have a staff of humans reading and tagging every potential story about your company as negative, neutral or positive.
Really? I mean can you honestly say that automated sentiment is useless and wrong ALL of the time? There are certainly some examples out there of solutions that are far from useful, but is that a reflection of the entire industry?
I started to think of some other automated things that I rely on in my daily life and wondered if they are always perfect. If they always intepret things the way I do. Or, perhaps, they make my job easier so as a human I can focus on other tasks that matter and can filter out items that would otherwise take up time and distract me.
Here's what I came up with:
Spam filter software: Wow. Wouldn't I love this to be 100% accurate and catch every spam email that hits my desktop. But while it doesn't always filter out every offensive message, it is much more efficient than opening and reading every email to determine for myself if it's legit or not.
Search engines: How come when I type "best pizza in Boston" into Google Search I get a list of 7 establishments scattered around the city? Can't there only be one "best"? Doesn't it understand that? But luckily I agree with all the yummy choices it returns so I'd say it did a pretty good job.
Spell checker: I make typos. All the time. Every day. Luckily, there are these little squiggly lines that show up under my mis-spelled words to let me know there may be something wrong. But sometimes the suggestion the software makes is wrong or the word I wanted isn't even listed as a suggestion. However, most of the time, when I type "teh" and mean "the", it gets caught.
Obviously I'm having fun with this post and I certainly don't mean to imply that any of the above need to be perfect, I'm happy they work most of the time. But the bigger picture is that these examples all have to do with words - and how we express them and how we input them or receive them.
If you are being vocal with your expectations that sentiment software based on text needs to be accurate and right 100% of the time, you are bound to be disappointed more times than not.
If you are processing a lot of information and need to streamline the process by concentrating on the extremes, then explore what automated systems can do for you. It seems easier at times to focus on what it can't do instead of what it can.
And don't believe claims of absolute accuracy, especially with sentiment and text analytics. Computers can only process so much from text, as text is comprised of typed words, and like the spell checker example above, typed words are by no means perfect. At least mine aren't.
- Christine Sierra's blog
- Login or register to post comments


Comments
I'm not sure your analogies w/other examples are appropriate. E.g., you don't have to read every word of a document to be pretty sure the spell checker didn't miss a misspelling--you just examine the ones that have red, squiggly lines under them. But it's hard to know which text passages an automated text analysis package might get wrong unless you at least skim through them all. And currently the accuracy rate tends to be no more than about 80% on the easiest documents. Throw in slang/idiom, sarcasm, and extreme abbreviations and gross misspellings, and things can deteriorate quickly--especially when multiple things are being evaluated in the same sentence.
Larry,
I agree completely. Also, throw in run on sentences and closed captioning - for example - is in all CAPS when translated and most analysis tools rely on grammar and capitals to make assessments.
Our opinion has always been that sentiment and text analysis will work just fine with certain sets of data, and not so well with others. If you are considering options, you should always ask for a proof of concept with your data set to be sure it is worth the investment. Thanks for you comment.
Christine