Emoticons and emoji have quickly become standard indicators of tone in casual text conversations. On mediums like Twitter, where users only have 140 characters to express themselves, these symbols can convey a lot about a writer’s attitude towards a particular subject. If sentiment analysis software doesn’t take this into account, it can vastly skew the data.
But calculating the sentiment of emojis isn’t as easy as “smiley face=good, frowny face=bad”. In our quest to integrate emoji recognition into Salience text analysis software, we came up against a few technical challenges.
Emoji vs. Emoticon
Similar names, similar functions, but different issues and origins. So what’s the real difference between the two?
The invention of the modern emoticon is largely accredited to Scott E. Fahlman, who proposed the use of “:-)”as a way to denote a joke on an online bulletin board. From humble beginnings sprung a multitude of different expressions composed solely out of the existing characters available on a QWERTY keyboard.
The emoticon presents a couple challenges to sentiment analysis. Because they are comprised of different combinations of existing characters, there is always the possibility that a combination of characters not meant to be an emoticon, perhaps a typo or otherwise will be interpreted as one.
Was Abraham Lincoln a fan of the winky face? I’d like to think so.
The other issue is the lack of standardization. While human brains, hardwired for facial and pattern recognition, interpret :), :-), and (^-^) all as smiling faces, text analysis software couldn’t be expected to automatically do the same.
The emoji is a 12×12 pixel symbol designed for much the same effect as the emoticon. Originating in Japan, emojis offer a wide array of highly specific characters, from faces to food. Many of these symbols are culturally specific to Japan.
Unlike the emoticon, the emoji has the benefit of being fairly standardized, however, the Japanese symbols are not universally supported. Some emoji sets have, within the past few years, been incorporated into Unicode and can be accessed on the Windows Phone 7, Mac OS X, Gmail, and others.
Using the Unicode bank as the standard, assigning sentiment to emoji is relatively easier than in the case of emoticon; where the same sentiment can apply to several, only minutely different character combinations. But the lack of universal support for the emoji is a big stumbling block, for both human and machine sentiment analysis. A person rating the sentiment of a tweet where an emoji is used may not, depending on how they are viewing the tweet, be able to see the character, which could change the tone of the entire message.
In addition to the aforementioned sentiment, emoji can also carry other meaning – for example, a piece of pizza or a picture of a camera. It’s important for classification purposes to be able to correctly assign these to the right buckets (like, “food” or “photography”)
Emoticons and emoji aren’t going anywhere in the near future. The symbols have only increased in popularity as an effective substitute for the lack of inflection in online written communication. That’s why, even with the challenges they present, it was important for us to be able to offer emoji support in Salience 5.1.1.