The first step in getting computers to understand the sentiment of text is getting them to understand the sentiment behind individual words and phrases. Understanding sentiment happens on multiple levels: at the most basic, a sentiment analysis engine needs to understand that words and phrases like “appalling”, “morally bankrupt”, and “terrible” are negative, and words like “stellar”, “amazing”, and “perfect” are positive. Unfortunately, emotion and sentiment is rarely presented so clearly.
Take the word “good” as an example. It must be positive, right? Well, not quite. “Good” is such a useful word that we use it even when we aren’t trying to express a judgment on something. I say “Good morning” to my coworkers even on dreary, rainy mornings. “This is a good time to break for lunch” seems far too neutral to attribute any sentiment. “Lend me a few bucks? I’m good for it” does try to claim a basic decency of character, but is certainly not in the same class of sentiment as “What a good person you are!” or “This will create good for all mankind.”
Positive and negative language is found sprinkled throughout even the most mundane of phrases. Across a longer piece of text, positive and negative sentiment tends to average out towards neutral and not skew statistical inferences too far. When analyzing short content (such as Tweets), however, it’s important to separate genuine sentiment from basic, factual everyday phrases.
At Lexalytics, our approach has been to hand-categorize content into polar and non-polar groups. Polar words and phrases are usually associated with strong or clearly defined sentiment, while non-polar refers to words and phrases typically used in everyday, mundane language. Once content has been hand-sorted, we set our algorithms free to learn to differentiate the two. For this article, I went through the model to see some of the clues the engine learned.
So, then, what did the engine find? For one thing, when we talk about “day” or “night”, we’re probably just being pleasant. If we mention “weather”, though, we’re almost certainly expressing an opinion. Birthday? We’re being pleasant. The weekend, our sleep, or our breakfast? We want you to know exactly how we felt about them.
In addition, a superlative (as in “best” or “worst”) is more likely to be used without judgment than a comparative (“better”). Also, we use more adjectives, adverbs, and interjections when we’re trying to express an opinion. On the other hand, some findings were less intuitive. Most intriguingly, the engine has decided that the use of the word “velociraptor” makes it slightly more likely that you’re expressing a genuine sentiment.