LexaBlog

Our Sentiment about Text Analytics and Social Media

Impressions From Text Analytics World Boston
Submitted by Carl Lambrecht on Wed, 2012-10-24 19:50

I recently attended Text Analytics World, an annual text analytics conference that took place in Boston two weeks ago.

The two main themes were “big data” and “social media”. There was some discussion around sentiment analysis of social media content, with vendors and analysts that presented making the following point: accuracy of automated sentiment analysis, particularly in social media content, is difficult to measure and sometimes useless. In fact, we heard several times over the course of the conference, that hyperfocus on accuracy leads people away from defining and solving the real business problem they need to.

There was a lot of discussion around taxonomies, ontologies, categorization, and in particular automated categorization. Companies that have large repositories of internal unstructured data are struggling to get a handle on and organize that content to make more efficient use of it. The work we are doing for our next release using the Concept Matrix will be a useful tool for helping companies bootstrap their taxonomies, and then adjust from that machine-generated starting point.

Multi-lingual capability was also a hot topic, which is good for us based on the language support we’ve developed over the last couple of years. Analysts cited that there is an increasing need to analyze non-English text, and machine translation doesn’t work well enough, particularly when you compound the other inherent error factors in the analysis of English content.  Meta Brown’s presentation reiterated these needs and concerns with current approaches using machine translation. And a vendor presentation on the analysis of Spanish social media content showed the insights you can extract when you analyze the language in its native form, even accounting for (and gaining even more information) from regional slang.

In terms of “semantic processing”, I think our Concept Matrix is a great strength, and we will continue capitalizing on it. One of the two sponsors was Expert System, represented by Bryan Bell. Bryan pushed the message of using semantic understanding of language to aid other areas of the text analytics stack, mainly categorization, but also entity extraction and sentiment analysis. Interestingly, Expert System claims to support several non-English languages, but do it via internal machine translation and then analysis of the resulting English, which conflicted directly with the discussion we’d heard about handling multiple languages.