Hola mi nombre es Carl

  3 m, 15 s

Whenever family and friends ask me to describe what I do (which is all the time because I’m a man of mystery), my answer begins simply: ‘I teach computers to understand human language,’ I tell them, and then inevitably clarify that I’m referring not to spoken language, but to the stuff we write online. Everyday, I explain to my bewildered in-laws, consumers like you and me produce enormous quantities of text content that we post to social media platforms, online review sites, blogs, forums, and other websites.

To put it in perspective with an overused Eric Schmidt quote, “there were five exabytes of data created between the dawn of civilization through 2003. But that much information is now created every two days.” And as the internet continues to mature and expand, this volume of content grows with it. ‘What I do,’ I continue as my audience’s groans (just kidding, they’re totally riveted by this point), “is help computer systems make sense of all that noise.”

The generation of consumer-made text content follows a trend. It starts with a fledgling community of individuals voicing thoughts, recommendations, and experiences in a subject they are expert with. These conversations are usually eloquent and fairly structured – that is to say, easy to analyze and process. Over time, however, the community grows to include more and more voices, all clamoring for attention (e.g. “bruh dat hotel room was sick as hell”). These are all important voices to hear. But as much as this larger volume of content represents an opportunity for more and greater insights, it also represents an enormous challenge.

Consider the hospitality industry. Historically, hotel managers and restauranteurs combed through customer reviews by hand to figure out what they were doing right and wrong. This task was arduous enough when it was limited to physical comment cards. Now, in the age of social media and online review sites, the sheer volume of content that needs processing is enough to make any data analyst cower in fear. The costs (in labor, in time, and in other resources) to the business-owner are tremendous. And the fact is, you can never be sure as to the accuracy of your over-worked employees’ analyses. Nor can you be sure each employee attaches the same sentiment to any number of ambiguous terms, like ‘sick.’ All of this equals inconsistent results.

You need something to cut through the noise and direct your attention at the important information.

You need automated text analytics (alternatively known as text mining).

My company, Lexalytics, began live analyzing newsfeeds to extract names of people and companies (named entity extraction), and then figuring out if they were being mentioned in a positive, negative, or neutral manner (sentiment analysis). When we launched our on-premise text analytics solution, Salience Engine, back in 2004, our technology was (and, thanks to our refinements, remains) groundbreaking. The thing is, news articles and professional reviews are usually structured in a predictable manner, and usually follow the regular rules of spelling and grammar (keyword: usually).

When Facebook, Twitter, Yelp and their ilk came into their own, the rules of the game changed entirely. Both Salience and our in-the-cloud solution, Semantria, understand social media. This kind of content, written in what we call “natural language,” represents a very different challenge. After all, when was the last time you read a grammatically correct tweet? What’s more, as our customers’ needs continue to expand, Salience and Semantria have learned to natively understand more than 20 languages and dialects, colloquialisms and all.

Alright, I’ve just thrown a text wall at you. But it’s only a small portion of what my job involves. Still with me, and want to learn more? Check out our website and resources collection and plumb the depths of modern text analytics. And be sure to get in touch with any more questions! I love to talk about what I do.

Categories: Lexalytics, Special Interest, Technology