Our Sentiment about Text Analytics and Social Media

Submitted by Mekkin Bjarnadottir on Fri, 2013-06-21 16:36

This is probably the coolest new thing that we have in our toolbox.  All of our customers like the idea of “magic” functionality.  We have a long tradition of providing functionality that is available and worthwhile straight out of the box (sentiment, themes, summarization), all the while allowing you to configure and tune the content to meet your application’s needs.  The one area where we didn’t have any magic was in categorization/classification.   When we introduced Concept Topics, they made the work of configuring classifiers way easier, because you could deal with large topic areas without the labor of training a classifier or building a large taxonomy.  However, we only gave you a sample set of categories that you were supposed to then go off and build your own categories.

And, lo, there was a demand for more!  It turns out that Wikipedia has some very nice category areas of its own, and so we’ve produced a new Salience call that will automatically categorize your documents into a set of about 4,000 categories which are rolled up into 125 high level categories.  These categories each match a single Wikipedia page of reasonable breadth, for good coverage of the Wikipedia knowledge base.

For example, take the following text:

"We're seeing a new revolution in artificial intelligence known as deep learning: algorithms modeled after the brain have made amazing strides and have been consistently winning both industrial and academic data competitions with minimal effort. 'Basically, it involves building neural networks — networks that mimic the behavior of the human brain. Much like the brain, these multi-layered computer networks can gather information and react to it. They can build up an understanding of what objects look or sound like. In an effort to recreate human vision, for example, you might build a basic layer of artificial neurons that can detect simple things like the edges of a particular shape. The next layer could then piece together these edges to identify the larger shape, and then the shapes could be strung together to understand an object. The key here is that the software does all this on its own — a big advantage over older AI models, which required engineers to massage the visual or auditory data so that it could be digested by the machine-learning algorithm.' Are we ready to blur the line between hardware and wetware?" 

This text is automatically classified into the following top level and second level categories:

  • Mind (1)
    • Computational neuroscience (.75)
    • Cognitive neuroscience (.58)
    • Cerebrum (.56)
    • Neural engineering (.55)
    • Neuroprosthetics (.51)
    • Neurotechnology (.51)
  • Computer Science(1)
    • Philosophy of artificial intelligence (.61)
    • Machine learning algorithms (.54)
    • Computers (.73)
  • IT (.71)
  • Robotics (.63)
  • Life (.57)

Since we’re Lexalytics, we couldn’t just let that go at that, we know that you’re not going to be utterly and completely satisfied unless the taxonomy completely meets the taxonomy in your head.  Two new files in the data directory allow you to a) configure new categories or override the categories that we already have, and b) allow you to configure a multi-level taxonomy of your categories.

Submitted by Seth Redmore on Fri, 2013-06-21 16:22

Hot off the presses...

Lexalytics is pleased to announce the release of Lexalytics Salience Engine version 5.1.1. While the version number may not mean a lot, the functionality certainly will. Lexalytics' Salience Engine supports many different text analytics functions. These functions include named entity recognition, sentiment analysis, text categorization or classification, summarization, theme extraction and more. These functions help derive the meaning of the content that is being analyzed - the "who", "what", "where", and "tone" of it.

We're going to have blog posts over the coming week or two that dive in a bit more into the new, nifty functionality available in 5.1.1. 

Submitted by Mekkin Bjarnadottir on Fri, 2013-05-31 23:55

As I said in Part 1 of this series, the introduction of our lastest Language Pack for our Salience text analysis engine has us interested in issues surrounding Chinese language and analysis. Last time I spoke about the controversies surrounding the Windows 8 ads released in Asia and the extreme dialectical differences in spoken Chinese. This time, I'll be discussing written Chinese and the unique challeges we've faced while developing Salience for Chinese.

Somewhat counterintuitively, the dialectical differences of the Chinese language family isn't the biggest obstacle to written text analysis. Although there are many phonetic differences between topolects, the enforcement of a written standard means that they  all share the same written characters, which incredibly helpful for text analytics. Even so, slang and vocabulary does change from place to place, but that is a common feature of most widespread languages, including English. Carl Lambrecht has already discussed very easily overcomes these differences in other languages with contextual part-of-speech tagging and the versatile customization of the software's lexicon. 

The biggest difficulty with Chinese text and sentiment analysis are those that surround the unique way that a language such as Chinese is constructed. We’ll take you through the three biggest challenges and how we’re working to solve them:

  1. Simplified vs. Traditional Characters: While Simplified characters are the mainland standard, Traditional characters are still widely used in places such as Hong Kong and Taiwan. This problem was one of the easiest to deal with, using a basic one-to-one mapping to account for the presence of both.
  2. Named Entity Extraction: With languages that we’ve worked with before, capitalization has functioned as a useful way of identifying entities. This is something that isn’t present in Chinese writing, making it harder for the software to recognize entities. While our Named Entities for Chinese is still in development, we’re looking at an approach that uses machine learning algorithms alongside rules to get the most accurate results.
  3. Word Segmentation: Chinese is comprised of distinct characters, each one of which may represent a word by itself, or in conjunction with other characters. To tackle this, we’ve used statistical machine learning algorithms in a process called "tokenization".  An interesting side effect of this is that we're able to deal with multi-word hashtags in other languages (e.g. #ilovelexalytics). 
The Chinese language pack for Salience was a long, involved process and gives us a broad toolset that we can use with other problems and other languages.  We've gotten great results with it, and are looking forward to when we release it in August!
Submitted by Mekkin Bjarnadottir on Fri, 2013-05-31 23:27

The new Windows 8 advertisements released in Asia have caused quite a bit of a stir, a result of their bizarre content and the garbled language that’s being used. Consensus from native speakers of Korean, Japanese, Cantonese, Mandarin, and other Chinese dialects all declare the commercials completely unintelligible, leading many to conclude that Microsoft has made up its own faux-asian language either to avoid alienating any particular market, or simply to create more mystery around the eccentric advertisements. 

Victor Mair, in his article, The enigmatic language of the new Windows 8 ads, details the responses from many native and non-native speakers, none of whom can place it. Most interesting is the fact that non-native speakers seem to be able to pick out more Chinese than native speakers, and that several suggestions are made about the possibility of the advertisements being in a somewhat obscure topolect, such as Wu (a topolect being a family of related dialects). 

While I have no personal insight into what Windows may have been thinking, the conversation brings up some interesting points about Chinese and the difficulties involved in analyzing it, both spoken and written. We recently released the beta version of our own Salience Text Analysis for Chinese, the sixth of our Natural Language Packs, and have taken it out for a very promising test drive, so this particular topic has been at the forefront of our minds.

Mair’s article focuses on the vastly different topolects spoken across China, over sixteen of which are distinct enough to have their own Wikipedia page. The standard language in China is Beijing Mandarin but, as our own Chinese language expert Elizabeth Baran tells us, “All of these [local dialects] still have some sort of prevalence in their own respective areas,” although, she says, the introduction of an official written and spoken Chinese standard has served to decrease this prevalence. 

As to the garbled Windows 8 advertisements, she adds, “There are so many political and cultural implications, as well as sensitivities,  surrounding the use of certain dialects, in what context, and by whom…maybe Microsoft is speaking to the theme of unity.” That would certainly be a step closer to understanding the mystery of the Windows 8 language.

While we still don't know exactly what motivated Microsoft's creative decisions in regards to the commercials, the general consensus seems to be that the language spoken in the advertisements is either made up, or modulated out of recognition with sound software in post production. The one thing we do know for certain is that it has sparked an incredibly interesting conversation about language differences within Asia and the unique complexity involved in choosing a language when marketing within that region.

Stay tuned for Part 2, where we discuss the interesting challenges that we faced in developing our Chinese Native Language Pack.

Submitted by Seth Redmore on Wed, 2013-05-29 13:39

We've posted a new entry to our development blog that may be of more general interest to our readers.  Salience has good power under the hood for dealing with lots of interesting cases, and Tim Mohler, our VP of Professional Services has written up an interesting case on how one of our customers has configured Salience to handle entities that sometimes aren't tagged as proper nouns.

Salience does a good job at picking up company names right out of the box, but some company names can give Salience trouble. Company names made up of common words and phrases like Best Buy or Glad might not get picked up since Salience will not always tag them as proper nouns.  If you have certain company names that are giving you trouble, you may need to configure Salience to ensure they are always recognized. This article provides examples of configuration steps to address advanced entity recognition.

Submitted by Seth Redmore on Tue, 2013-05-28 18:49

Facebook’s new Graph Search feature has been a lightning rod for grand predictions since it was announced earlier this year. The new social search function has been described as everything from revolutionary to completely doomed. The truth, as in most cases, lies somewhere in the middle. We’re going to throw our hat in the ring on this one and say that, while the new search is cool, without more work on understanding what's in the content itself, the new search isn't going to change much. 

Before Graph Search, Facebook search was limited to finding people and pages that match particular keywords. It didn’t leverage any of the oodles of information they have on what people were “liking” or the content of their status updates.   Graph Search is Facebook’s first attempt at actually taking advantage of all the information that we’re happily giving them in the form of interaction with our friends and with various organizations on Facebook.

What is especially exciting about Facebook’s claims for Graph Search is the integration of natural language processing allowing for searches that understand far more than just keywords and providing better results based on profile and page information. 

This has been no small endeavor. Facebook Graph Search uses Facebook’s Unicorn search system to cross reference different “nodes”. Currently these nodes include people, pages, events, applications, groups, places, check-ins, and objects with location information attached to them. This cross-referencing uses your Facebook friends as its source of data, and can even narrow down search results based on friends attributes. For instance, searching “my friends who like The Who and who are under 35 and who live in California” shows you only friends who meet those criteria (a disappointing six friends, in my case).

The promise of natural language processing is in enabling the search system to understand a variety of inputs, ascertaining the user’s needs, and outputting results that meet that need.. The order in which you place the different search parameters doesn’t matter, and the search engine understands several different ways of phrasing the same request. “My friends under 35 who live in California and who like The Who” will generate the same results as the first example, although the search in the search bar will correct to “My friends who are younger than 35 and like The Who and live in California”, in order to help the user acclimate to the most efficient language to use when searching.

There are drawbacks, however. Facebook Graph Search might, for many, be less immediately intuitive as it sounds. The proximity to the way that we normally speak might even be frustrating when the specific way we phrase a search is not understood by Graph Search.  In other words, while Facebook is claiming that this is a “natural language system”, it’s really not.  They recognize certain additional operators, but it’s not terribly flexible.

They’ve also taken the wise step of backing up their search system with Microsoft’s Bing, so that queries that the system doesn’t know how to handle return search results from the Bing search system instead.

Take the following examples of truly “natural language” queries:

  • “I want to go to a restaurant near me.”   This query should take your desire into account, and optimally, should take into account restaurants that are open right now, as well as the restaurants that your friends have “liked”.  This search fails over to Bing web search, and gives a list of sites where you can actually find a restaurant near you.
  • “I need a haircut” or “haircut” also goes straight to Bing.
  • “What bars are the best in San Francisco?”  You guessed it.  Straight to Bing.

Let’s talk more about what does work, and how we at Lexalytics would classify the Graph Search system.

Here’s the results of a search for “people who like Dubstep”.  If you’re not familiar, it’s a genre of electronic music that’s characterized by a “wobble” bass line (filters with an LFO) and typically has some sort of “drop” where people freak out on the dance floor.

A few of the people who pop up in a search for "People who like Dubstep"

The first clue to what Graph Search actually “is” is on the right side, where the filtering pane is located.  These are the various filters that you can further apply to narrow down your search.  This looks much more like a “faceted” search than a true natural language search.  I would classify it as keyword search with additional operators, some synonym processing , and search facets for filtering. That is a pretty long ways from real “natural language search”.  

I’m not hating on Facebook, I love it and use it, but as a marketing professional in the natural language space, I take great pride in correctly characterizing our products.   However, I’m probably a bit too close to this, and “keyword search with additional operators” doesn’t have quite the same ring to it as “natural language search”, does it?

Enough about the front end, let’s talk about the data. Search systems are deeply dependent on the data that they have access to.  Facebook has been struggling with issues around privacy and sharing almost since their birth.   Graph searches rely completely on access to shared information.  If users don’t share their info and likes, Facebook has nothing to work with.  As the short-lived Tumblr Actual Facebook Graph Searches so succinctly points out, those who are not careful or aware of Facebook privacy settings leave themselves vulnerable to searches that can range from humorous, politically dangerous, and in some cases just plain creepy. But if privacy settings among Facebook users tighten, the amount of data available will shrink and it limit Graph Search’s usefulness tremendously.

Facebook is relying heavily on the “like” button to provide information for Graph Search. Up until now, “liking” pages and checking in to places served no real, practical purpose, rendering them only a form of self-expression. The new utility of these categories will have to create a different understanding and use of those tools, or be rendered ineffective. The problem is that “likes” do not accurately represent sentiment about people, groups, and pages. For instance, the Facebook page for Amy’s Baking Company, the restaurant made infamous both by their appearance on Kitchen Nightmares and by their widely publicized social media meltdown, gained tens of thousands of “likes” for days after they gained internet popularity while simultaneously, their Yelp rating tanked to 1.4 stars. “Likes” can’t really be used as a measure of how a user feels about a certain topic, and when corporations are actually in the business of purchasing “likes”, as Nicholas Carlson explains in Facebook's Seach Is Based On A 'Con', it’s possible that Graph Search is indeed destined to be useless and inaccurate. 

Different pages also present obstacles to the new search feature. Because nodes are based on pages that are user created, one basic idea might have upwards of five or six slightly different pages available to “like”. This specificity was ideal for those looking to define themselves in some way through their “like” activity, but provides a very real problem for Graph Search. If I want to create a running group by searching nearby friends who like “Running”, I may miss all of my friends who liked the ”Marathons” page instead.  It is unclear how much semantic understanding (even at the level of synonym relationships) is present in Graph Search.

We also believe that there is a lot of opportunity inherent in the status updates that people are making and in the text on the pages that are being liked or content that’s being shared that currently isn’t being utilized.  Our customer Bitly is an excellent example of what can be done with text analysis in the context of content sharing.  They collect terabytes of data from URLs that users shorten and share on Twitter.  They are able to tell their customers about the content of those web pages, from “who” is being discussed, “what” is the context, and any sentiment towards the context or entities.  A similar system would be highly useful for Graph Search – one that looked at the meaning of status updates to try and ascertain whether someone likes something without the “like” tag, or maybe even if someone has “liked” something not because they really like it, but instead because they want to follow the train wreck (we’re looking at you, Amy’s Baking Company).

Expanding a bit on the dubstep example, let’s see what the Facebook page “Dubstep” tells us:

Facebook's response to a search for "Dubstep"

This is actually just from Wikipedia, but let’s treat it as regular text for right now, and throw it into Salience.

Our newly release automatic document categorization does a 2-level classification, with about 125 top level categories, and about 4,000 second level categories.

Dubstep classifies like this (in the parens are scores)

  • Music (1)
    • Electronica (.57)
    • Indie_rock (.55)
    • Hardcore_techno (.52)
  • Dance (.57)

I’m dubious about the “indie_rock” sub category, but it’s certainly close enough with “dance” and “music” as top level categories.

Taking a different example, let’s look at classical music:

About Classical Music

Classical music classifies as:

  • Music (1)
    • Music (.42)
    • Music_cognition (.36)
    • Ornamentation (.35)
    • Organology (.33)
    • Baroque_Instruments (.30)


Note the “baroque_instruments” bit, and the lack of a classification for “dance”.   Classical is generally not considered “dance” music.

This is really useful information.  Imagine a Graph Search where you can ask for “friends who like dance music”.   And, yes, I know that there is a “dance music” page, but this is where some of the crafty text analysis comes in.  People who have “liked” the dance music page are not necessarily a super-set of people who have liked dubstep or trance or psytrance or house or gabber or happy hardcore.   These are all dance music genres, and I want to find out who likes dance music.   A good classification system (like what we have in Salience) can give you this sort of association between members of a set, and this is where just relying on what’s been “liked” falls short.

Also, where’s the dislike?   I understand why Facebook chose not to have this, but, I certainly have “friends” who say things like “I hate electronic music!”  (Well, ok, I don’t really, but work with me here.)    Well, now, we can see negative sentiment in the status update, along with classifications of “music, electronica” – that’s good information as to who not to invite over to your house music marathon.  Or “I hated Amy’s Coffee House”, or “Isn’t this article stupid?” – there’s explicit sentiment and context information there that would be useful for graph search.   Even if you wanted to stick to the positive, a simple “I love this place!” is useful.  I personally don’t “like” things very often, but I will comment on them, and that’s information that Graph Search can use to make better recommendations.  I

Graph Search may prove a boon to marketers, who can use the new features to create far more specific target demographics, and so tailor advertisements even further. The ability to search for places “nearby” will undoubtedly boost local advertising, and make Facebook a bigger marketing priority for local business owners. However, it’s not really “natural language search”, and there’s a lot of value that can be added to it with something like Lexalytics Salience Engine to pull out more details and similarities of pages and status updates. 


**Coauthored by Mekkin Bjarnadottir
Submitted by Seth Redmore on Wed, 2013-05-15 23:54

Well, that time of the year has come and gone, and we're all better people for it.

The place:  New York City

The event:  The Lexalytics User Group, 2013

We had a nice crowd, both in terms of size and the general niceness of people :).  I had my mad MC skillz on the mic all day, introducing a number of great speakers and wrapping things up with a view into our new software release: Salience 5.1.1 (details forthcoming).

Videos will be posted as soon as I get 'em, but for now, use your imagination.  

Anna Smith from Bitly led off, with a rousing presentation on what you can do with a view of every page that people are referencing on Twitter.  Not just reliant on the very short 140 characters, they have a great view of the content behind the shortened links.   They have hundreds of terabytes of content stored, and are adding several terabytes of new content each month.   As she demonstrated, there's incredible power in having this view - you can see what people find important to share, and give advice back to content providers and enterprises as to where they're being successful.  Also, cats.

Oleg Rogynskyy from Semantria (our beloved SaaS text analysis partner) showed how they've built a very flexible, scalable system on Salience - bringing our heavy duty text analysis platform within reach of those pesky surveys and small jobs that just aren't worth building a whole system.  Semantria is a great option for those companies with low volume projects that need access to high-capability text analysis.  Sentiment analysis, categorization, facets, themes - the whole nine yards wrapped up in a nice SaaS platform, and presented as an Excel plugin for the software that everybody uses.

Craig Golightly of Software Technology Group enraptured the group with his low-tech flipchart presentation.  I would argue that he was the most engaging speaker (other than myself, of course) of the day.  He used a fantastic analogy with a "dirty brownie" (no, really, he had brownies up there in front of the group) to show how you can carve out useful data from the messiest dataset.   Craig has lots of experience with voice-to-text, an important feeder technology for text mining.  He's a huge proponent of Voci's system, and after his presentation, we can understand why.

Elizabeth Baran of Lexalytics went over all the work that has gone into our Chinese language pack - from tokenizing words (no word boundaries in the printed text, you know) to handling Chinese idiom, she did a marvelous job of showing just why you can't rely on translation to get the most out of your text.  

Brandon Kane of Angoss showed their predictive analytics system on a #bigdata (I just thought I'd throw in a random hashtag here) set of well over a million tweets concerning various cellphone brands.  He showed interesting correlations between sentiment, Klout scores, gender, and issues like "returns".   This is information that's priceless (well, it does have a price, but work with me here) for marketing people looking to make better decisions about where to put their resources.

Russell Couterier of Cybertap had by far the most questions and really grabbed the imagination of the group.  They've built a product that's in use by a number of sooper-seekret organizations to help ferret out cybercrime, as well as being well tuned for compliance.  They can capture all the packets on very high speed links, completely reassemble the sessions (down to showing you the web pages, emails, instant messages, and the like), and then they pass it through Salience and our text mining capabilities - giving a rich view of just what is being discussed - and popping out all the anomolous and illegal behavior with a nice bright highlight.

Tim Mohler of Lexalytics displayed a few ways that our customers can use Salience in ways that allow them to get more out of the text analysis that we provide.  From rolling up discovered themes into coherent buckets to giving an "unlimited dive-in", he showed different ways that our base "set of Legos(tm)" can be used to provide very rich information through multiple processing and aggregation steps.   He also showed some experimental (well, this was all experimental), but more experimental work that uses NHibernate to take a complete snapshot of Salience output and place it in a database so that our customers can root around and experiment with the output.  This is important because often times it can be hard to decide what to extract and store as part of a big processing pipeline, by capturing everything from a dataset, he showed that you can surf around in that dataset and make better decisions around what you should show to everybody.

And then there was me, talking about Salience 5.1.1 (released on the day of the LUG), but that's a topic for another blog post.

Thanks again for all the attendees and thanks very much to the speakers for making our day a success! 

Submitted by Seth Redmore on Wed, 2013-04-24 20:39

On Saturday April 27th, viafoura is hosting a hack-a-thon to see what interesting ideas developers, product designers, and data scientists can come up with when given large data sets.

Hackers are armed with 10 years worth of  news stories from the Guardian (one of UK’s largest news publications), computing power from Amazon ($250 credit for each person), a natural language processing tool (Lexalytics-Semantria) and space donated by Ryerson’s Digital Media Zone. Mix all this together and you have all the building blocks to come up with some amazing big data projects.

Amaze us!  Wow us!  Amuse us!  Oleg Rogynskyy, the CEO of Semantria and I will judge and are both super excited to see what you can come up with!

Submitted by Seth Redmore on Fri, 2013-04-12 00:45

I happened upon (well, really, was fed via LinkedIn) a blog post by Tom Anderson over at OdinText.  I've seen some of his stuff before, and he seems like a reasonable gentleman.

I was kinda surprised by this post:  Is Social Media Worthy of Text Analytics, and thought it would be worth responding to.

To outline what he's saying (it's short, go check it out), it comes down to this:  

  • Coca Cola found that they can't use social media to predict short term revenue.
  • Twitter is lagging, not leading
  • Not many people tweet, so, you're getting a distorted sample
  • And those that are tweeting are trying to sell you on their expertise in managing social media, because, really - who tweets about Coke?  To wit:  "The fact that Twitter even scores as many mentions as it does for products like “Coca-Cola”, which most regular consumers would be unlikely to ever think about any given week, is that there are so many want to be social media marketing guru’s on Twitter and blogs trying to analyze others marketing campaigns – further proving what a peculiar sample blogs and twitter is.

Well, I haven't read the original source about how they were trying to predict revenue, so, I can't really comment on that first bullet.  I'm not sure that using it to "predict short term revenue" is as interesting as using twitter to "find places and events at which people are drinking coke" and market to those folks.

I disagree with the second bullet - it really depends on what you're looking at.  If you're looking for a reaction, sure, it's obviously lagging.  But, what if you're looking for second order  or future effects (like people talking about what they're going to do this weekend).  Brand mentions might be mostly lagging, but I'm not even sure about that.

I totally agree with the third bullet - Twitter is a self-selecting group with a large set of biases, I'm sure.

The fourth bullet was the one that I took real exception to.   So, here's what I did - I collected 24 hours worth of english tweets about Coke or Coca Cola, using the system over at Datasift.

26k tweets.  Ok, it was more like 23 hours, but I was impatient and kinda lazy, and just wanted to do this.

Let's look at what people are talking about... This is a really quick and dirty look at the top themes (important noun phrases), and the sentiment of the themes themselves over time.  This is completely unfiltered.  Color is sentiment, size is number of tweets, and no, you don't get a legend because you've been very bad.



overall themes from coke

I'm not seeing anything in there about marketing.  Even delving into the verbatims (so to speak) doesn't show much about marketing or social media monitoring (except that "crowdsourcing" and "photo booth" tie-up bits). But I do see a lot of people talking about coke.  :)

Let's do a bit more digging...  Here's the top hashtags:



coke hashtags

That "IAmSoMiddleClass" bit is an Indian thing, apparently.  The more you know...  We'll get back to the #addicted in a minute.

Datasift has some tech that lets me get gender for some tweets.  Cool.  Let's use that!  Not enough of the tweets are demographically classified to make a pretty picture, so, let's look at charts...

Here's the ladies:


female themes coke

(Maybe they're retail dudes, too, but it seems more likely that they're ladies.  No hating on the choice of pronouns, k?)

coke gigi hill

Speaking of dudes, what do they have to say?



Yes, more references to drugs, and, well the dudes like to talk about "sausage".  Hm.  Note that we could probably do some word sense disambiguation between the "coke" that is referred to as Bolivian Marching Powder, and the coke that my son likes, but, this is quick and dirty, like I said.

And for completeness sake, let's do the same thing for hashtags, ladies first:



female coke hashtags

Note the fact that there are 2 instances of "addict" on there.  (And, yes, they're talking about Diet Coke, not nose candy.)

Let's look at the men:


male coke hashtags

Check that out, not even a word about "addiction", and the whole "IAmSoMiddleClass" thing is concentrated with men.

So, my point, if I were to have one, is that real people are really talking about real stuff on Twitter.  Even brands like Coca Cola, someone is talking, right now, about where/how/when/why they're drinking one.   And in that information comes the possibility to learn/understand/market/connect/sell.






Submitted by Carl Lambrecht on Fri, 2013-03-15 16:11

Here at Lexalytics, we’re excited to be in beta with Salience Text Analysis for Chinese.

There are many features in our toolkit – sentiment, topic detection, summarization, theme extraction – but sentiment is what we’ve been best known for. With our beta release of text analytics of Chinese content, we decided to measure our document-level sentiment results against human annotations of sentiment, and compare to another public engine recently released that also provides automated sentiment analysis of Chinese.

What started as a basic measurement of precision and recall turned into a deeper effort to quantitatively determine how closely our sentiment analysis matches the sentiment judgment of multiple humans.



We gathered 109 pieces of Chinese content from Weibo and blog discussion forums each of which we annotated as positive or negative. The content was filtered to items that were clearly positive or negative to a human. Our intent was not to measure ability to detect subtle cases of sentiment.

Even though we marked content as only positive or negative, Salience, being phrase-based, can return a neutral result if it is unable to detect any sentiment at all. We think this is a valid approach – sometimes there really isn’t any sentiment.

The other sentiment engine that was tested, Chatterbox, was not observed to return a neutral result; it appears that all content is categorized as positive or negative, with an associated strength score.  It could be argued that you could consider a polar result with a small strength score to be "neutral"

Additionally, the phrase-based approach developed for Salience to assess sentiment in Chinese was developed from longer form text, but shorter text was needed for this test to accommodate a content length constraint imposed by the Chatterbox API.


Precision, Recall, and F1

The table below gives the precision, recall, and the weighted average (F1) for positive and negative sentiment within the set. The F1 score for positive sentiment and F1 score for negative sentiment are combined to calculate an overall accuracy measure.

precision recall accuracy comparison chatterbox salience

These scores are quite good for both engines, which can be attributed in part to the polarity in the test content selected. Salience performs comparative to Chatterbox in terms of positive sentiment, with slightly better performance on the detection and identification of negative sentiment.

As we have developed support for non-English languages, we have done so at a core level of deconstructing the language, and developing the support needed to handle the language natively. We feel this is a much better approach to NLP of non-English languages than using machine translation techniques to apply techniques developed for English to translated text. In order to test this, we took the Chinese content and ran it through the Google Translate API, and ran it through Salience's standard distribution for English.


precision recall comparision with translation

As you can see from the table above, a machine translation approach suffers from any translation issues, particularly when working with phrase-based detection of sentiment where other linguistic modifiers such as negations and intensifiers are taken into account. A model-based sentiment approach may be less affected by these issues, but will be less flexible to use across varied content domains and require more technical tuning effort.


Inter-rater agreement

To me, this is the most interesting part of the experiment. Automated sentiment analysis is often compared to human sentiment analysis through precision and recall tests. But that assumes that across humans there would be 100% agreement. In reality, there are discrepancies in the sentiment annotation across multiple humans, and in many cases the same human can mark the same set of documents with slight differences from one day to the next. So we want to measure the consistency of multiple human annotations of the same content, and calculate the consistency of the two automated sentiment approaches to human judgment. After all, if you can’t agree with another human, why expect the machine to agree with you? 

The same content was also annotated by an external contractor, a native Mandarin speaker located in China. Two inter-rater agreement measures were calculated, Krippendorff’s alpha and Cohen’s kappa. These two measures were also featured in an analysis of inter-rater agreement presented by Maritz Research at the Text Analytics World seminar in the fall of 2012.

For our dataset, both Krippendorff's and Cohens'indicated an agreement of 94% across the two humans, showing that even for a relatively small set of very polar content there is not absolute agreement between two humans.

In one particular example, we marked an article that was very prejudicial against Japan and pro-Chinese as negative, because of the emphasis of the prejudice. The contractor based in China, however, considered this to be positive content. Judging sentiment can be tricky, even for humans.

So how did the computers do?

Calculating Cohen’s kappa requires the results to be fully annotated, so for cases in which Salience returns an inconclusive result we considered what agreement would be if that result was taken to be a positive or negative result.   In other words "0=neg" means that if Salience returned a result of 0 (one that we would normally consider to be neutral), we will consider this results to be "negative".


kappa scores

Cohen’s kappa also only allows us to calculate agreement between two raters. The conclusion we can draw from this calculation is that humans agree 94% of the time on their ratings of the content, Salience agrees between 74% and 84% of the time, slightly better with one human than another and slightly better when inconclusive results are considered negative (or not-positive). Chatterbox fares worse with about 62 to 64% agreement with humans.

The calculation of Krippendorff’s alpha is more flexible, allowing for gaps in the annotations which accommodate cases in which Salience did not detect sentiment and allowing for determining sentiment across a group of more than two raters.

alpha inter rater agreement

The most interesting chart is below - so, for multiple raters (more than two), what are the best combinations. Because we're not forcing a "0" from Salience into either positive or negative, the agreement numbers end up better.   We're about 75% agreement across humans + Salience + Chatterbox.   (Which says good things about the state of sentiment analysis!)   We're, of course, happiest with the results between two humans and Salience, where we're pushing 90%.   

alpha scores with c greater than 2

These results show that Salience’s analysis of sentiment in Chinese content correlates well with human judgment. Perhaps not quite well enough to satisfy a Turing condition of generating results which are indistinguishable from those of a human, but certainly close enough to serve as a good starting point from which further phrase-based sentiment tuning can refine results.  



We think these results validate our approach of native NLP phrase based sentiment analysis for Chinese over a machine translation approach or classification model and show that on tonal content, Salience is a compelling option. At present, our attention is focused on including named entity recognition for our general release of support for Chinese. We’re pleased with the results of this assessment of our initial document sentiment analysis, and looking forward to bringing the full product to market.