Our Sentiment about Text Analytics and Social Media

Submitted by Jeff Catlin on Fri, 2009-11-06 05:00

Historically, Lexalytics has been very focused on the enterprise software market by building products that are easy to install, configure and get running. (Don't worry, we're not abandoning that model.) However, over the last 6 months or so we've been focused on the idea of building out a number of web based services aimed at extending our reach. In doing so, we made the decision to use Amazon's cloud services (EC2 and S3). The power and cost efficiency of these services has allowed us to build not just one, but two new services that will open up our abilities to a whole new group of users. But this post isn't about our new services (we'll share more about that in the coming week), it's really about the cloud. I'm a big-time convert to cloud computing and believe Larry Ellison got this one wrong. For businesses, deploying new web based services just got a whole lot easier, and a whole lot less expensive. For example, we will roll out a web API to our core Salience Engine soon and we've been able to do it without having to fight the "co-lo wars". In addition, we saved by not adding IT staff to the payroll to make sure the machines are up, configured and maintained. Wrapping up Salience for Amazon's cloud and rolling it out took less than a month and has cost us very little. What this means for prospective clients is that they can access our API through the web at a price that will be very attractive, particularly to smaller companies that aren't trying to "boil the ocean" of data. Simply put, this quick turnaround was all possible because Amazon is handling all the complicated machine maintenance and has offered machine cycles and storage at a very appealing price. Now that I'm a cloud convert, I expect us to roll out more cloud based products in the next year. What experiences have you had with the cloud? I'd love to hear your feedback on how it has worked, or hasn't worked, for your business.

Submitted by Christine Sierra on Fri, 2009-10-23 04:00

Recently, Jeff spent some time speaking with Jason Falls (@jasonfalls) of www.socialmediaexplorer.com and they discussed natural language processing, social media monitoring, Lexalytics' next product line Lexascope and more. Check it out here to see and hear more: http://www.socialmediaexplorer.com/2009/10/23/understanding-natural-language-processing-for-social-media-monitoring/

Submitted by Christine Sierra on Tue, 2009-10-06 04:00

If you've got tweets, we've got sentiment. And themes. And most mentioned people. And spam lists. In fact, the only issue we've run into is that Twitter won't give us all the data to analyze. All 100 gazillion tweets would be fascinating to analyze automatically, but they just don't seem to be there yet. Or, perhaps, they are building out their revenue model to sell us the data. Either way, don't fret. Just like in reputation management where analyzing every single document can be both time consuming and incredibly inefficient, the same holds true for tweets. The average of the sentiment is often greater than the individual tweet. As our CEO Jeff Catlin mentioned recently on ZDNet: "Sentiment measurement is at the forefront of much business analysis these days, but in some ways Twitter seems as if it was designed from the ground up to defeat any automated sentiment engine. For instance, there isn't much sentence structure in tweets, and what's there is often wrong. And many of the tweets are just tinyurl or bit.ly links with absolutely no content contained in the URL itself. Given these challenges, is monitoring and measuring sentiment in Twitter a hopeless chore? Fortunately the answer is No. Even though there are some challenges to automated scoring of Twitter content, there are also some advantages to processing tweets and in particular the tone within Twitter. The beauty of Twitter is that there is very little grey area in tweets. You're either posting some source of information, posting an opinion you have, or replying to another informative or opinion-oriented tweet." With the volumes of online data growing at an unbelievable rate, decreasing processing time and implementing automation become key to getting the job done. And from that automation process comes incredible value such as all the associated concepts and themes with a particular topic. Not just the ones with the most hashtags associated with them. And who is talking about those topics? And who else is mentioned with those topics? The value is not always in the number of mentions, while in some aspects that is helpful, but with the context surrounding the tweets and how businesses can use them. In the coming days we will be attending the Inbound Marketing Summit in Boston where we'll have a demo of our twitter topic tracking system available. We aren't formally releasing a site, or promoting a new product, but we are welcome to conversations about the topic of Twitter Topics and what is useful and what is just fluff. Text analytics doesn't just have to be about processing word documents and research reports - it is just as helpful in processing tweets, customer comments and smaller documents as well.

Submitted by Christine Sierra on Fri, 2009-10-02 04:00

It seems lately that there are more and more companies offering sentiment solutions to a variety of markets. Everything from health care to customer service to financial services and reputation management. But in spite of this, very few prospects seem to really understand what the technology will and won't do for them. Let's start with some basic questions to help you understand more about sentiment:

What does it mean to measure sentiment? How do I know if I really need to use it? That depends entirely on the intentions of the user and the content being measured. If you're looking at customer review data (let's say hotel reviews in this case), then you may be interested in the sentiment of each review for the hotel. Were people happy with their stay at this hotel? This would be an example of document sentiment. It would tell you if the overall review was good or bad, and offer little insight to the details of each review. In this case, processing large amounts of data about the same topic works well. If, however, you're reading a publication like Consumer Reports, then you're probably thinking more about how the different hotels stack up against one another. You'd like to do some comparison. In this case, the overall document sentiment wouldn't be of much help because the document will have some good and some bad content mixed within it. In fact, what the reader really cares about in this kind of content is the tone for each specific hotel that's being described in the document and the reasons why. Were the beds comfy? How was the shower pressure? Is the staff friendly? In some cases the beds may have been comfortable but the staff rude, which can sway the sentiment of a review. Depending on what is important to you, you'd want to extract the sentiment of each entity. This is known as entity-level sentiment.

What really matters in sentiment analysis? Is it the accuracy or the automation? Again, it depends on your needs and goals for using sentiment analysis. An example we often use where a technology-based automated solution really shines is in financial services where the trends across a collection of stories are what users are most interested in. They care less about the accuracy of every document detail, and more about the sentiment across a corpus of data that needs to be processed quickly. Financial Services is definitely one of the up and coming industrial uses of sentiment because the technology tends to perform better than humans in processing large collections of content. Reputation Management is another industry where automated sentiment analysis shines bright, but where accuracy comes under more scrutiny. It could be said that automated sentiment analysis was born in this space, and was invented because of the amount of time people spent hand measuring the tone around products and brands. While Reputation Management is currently the biggest market for the technology, it's probably not the best example of accuracy. It's hard enough to get humans to agree with humans on the tone for a specific story, but to get people to agree with a computer is even harder. I bring up these two contrasting uses because it's important for people to think about their specific needs and requirements before they jump into using any vendor's solution. Make sure the solution you're looking at is well-suited for the problem you're trying to solve. So while there are more claims of sentiment analysis hitting the market, and after 6 years as a company processing unstructured text and watching online content take hold, it's interesting to see how sentiment appears to be somewhat of a commodity. It challenges all the providers to do a better job in all aspects of the technology. However, it's a fact that analysis of good, bad and neutral isn't as easy as 1,2,3. Ask for a proof of concept before making a decision and make sure the solution is right for you and your business.

Submitted by Jeff Catlin on Tue, 2009-09-08 04:00

Ever since the New York Time's article about sentiment scoring, published a couple of weeks ago, there has been a pretty constant stream of people jumping in and demonizing automated sentiment or trying to pedal its eventual takeover of the free world. It felt a lot like political media coverage to me, lots of opinions but very few of them taking an honest look at the real problems and real solutions. Kudos to Nathan Gilliatt for putting out the list of many of these posts (see the list here). It's obvious that we all have axes to grind and software or services to sell, but focusing on the accuracy of automated sentiment is the wrong place to go. To remove any doubts let me state for the record that a machine based system will never score any random piece of social content as well as a human will. People simply have too much context in their brains, and there is no way a machine is going to match that. Given the preceding, one my ask whether I believe that automated sentiment is doomed to failure? The answer is NO, we need to use automated sentiment in ways that you can't provide with humans. I'll illustrate my point by focusing on one of the recent posts about sentiment, Sentiment analysis for online content: Honest? from CyTRAP Labs. The post wasn't particularly favorable to automated sentiment, but was one of the first posts I've read that asked the right question... Is this story relevant? If you're monitoring social media sources, then what you really want to know is: What's happening that I need to worry about? Text Analytics and automated sentiment is very good at answering these "trend spotting" questions. In fact, machines are an essential piece in providing trend spotting. Sentiment Analysis has made rapid inroads in the financial services industry because users don't care about the tone of each story, they care about the effect of a bunch of stories on the market as a whole. A number of our financial services customers are making lots of money trading equities based in part on sentiment trends, and that should say something about its validity. If you're job is to monitor a brand in social media then the trends and patterns are what you should be worried about, and automated analysis is great for this. If an automated system is only 70% accurate, it's still going to get the overall trend (up or down) correct for a given brand, and then the humans should always step in and provide the detailed analysis of that trend, including the identification and correction of the posts where the machine got it wrong. Let automated sentiment point the way, but trust humans to provide the detailed analysis that requires a few neurons. Automated systems will never beat humans on a story by story basis, so let's stop worrying about that and use them to provide services that humans can't afford to do.

Submitted by Christine Sierra on Wed, 2009-08-26 04:00

Learn how Search and Text Analytics fit together in the enterprise, courtesy of Network World Video Library - NetworkWorld.tv

Submitted by Christine Sierra on Thu, 2009-08-20 04:00

Earlier this year Jeff outlined a few things you should consider when investigating a reputation management solution. Since we often get asked for our opinion on this topic, I thought it would be good to outline those "questions to ask" again.

1. Where does the content come from? Good analysis starts with the content. Please, please, please don't get wowed by the pretty pictures. Real insight comes from looking for patterns and trends in large and varied content sets. Make sure your vendor can tell you how they acquire their mainstream, blog, and social media content. Ask the hard questions about where their data comes from, are there any potential copyright issues that could alter access to this information in the future, and do they have any agreements in place to go after content from the likes of Facebook or MySpace.

2. Can they customize for your industry? Nowadays, it's not enough to just monitor the passing mentions of you or your competitors. Insight comes from digging deeper to figure out what people are commenting or worried about, not what your marketing folks think they *might* be worried about. Whether a solution is based on a search engine or a text analytics engine, make sure it can discover what's driving the discussion about your industry. You need to go beyond measuring the penetration of your marketing message, because after some analysis, that may not be what people online are talking about.

 3. What's the sentiment of my brand? Sentiment: it's the new beige (yes, this is good for us, since we have a sentiment engine). We've noticed in the last 12 months that sentiment has become one of those checklist items in brand and social media monitoring. I suspect it has come about due to the economic ups and downs, and the ever increasing reach of consumer generated content. Companies have to know if they're getting trashed out in cyberspace, and because of the volumes, the only way to do this is with an automated sentiment engine. Your vendor may not use our engine, but whatever they use, make sure that they can measure sentiment at the item (company, brand, product) level. Measuring sentiment at the document level is fine, and may provide some of the needed insight, but if the content is comparing two brands, then you want to go beyond the document level and into the actual comparisons. And, yes, we do recommend humans play a role in the sentiment analysis process. Automation is an added benefit in the process.

4. Can I touch it? The first generation of reputation management systems tended to have large account management teams behind them to build out and manage customer's reports. The customer couldn't go in and adjust the reports themselves because the systems weren't exactly user-friendly. This is fine if you have deep pockets for all the services work, but in today's world that doesn't seem to be the norm. Many of the newer solutions that are available, or are being built, allow the customer to build and manage their own reports. This is no small undertaking, but it does provide users a cost effective way to gain the insight that something like Google Reader can't provide. Naturally, when selecting a provider, it's not as easy as asking these 4 simple questions when making your decision. There are many other questions about integration, update frequency and test-driving the solution. But if you can answer these 4 to your satisfaction then chances are the solution you're considering can at least help get you started with reputation management.

Submitted by Jeff Catlin on Wed, 2009-08-12 04:00

So, it's been a while since I penned a blog post, and in this case that's a good thing, because its been a pretty busy summer. As I haven't blogged in a while, I thought it would be a good time for a "State of the State" sort of post, so without further delay... Historically, Lexalytics has tried to cast a pretty wide net into the OEM world, but most of our success has come in the Reputation Management space, but I'm happy to report that this appears to be changing. We're still doing very well in Rep Mgmt, but we are seeing an ever increasing percentage of our leads in other areas like financial services and Customer Satisfaction. The really good news on this is that it appears to be due to the maturing of the industry, and not some specific marketing program we're running. There are more and more prospects showing up at our door, who have specific "text analytics" needs, and this bodes well for the future. In spite of a tough environment, it looks like we'll grow the business at least 10% this year, which in the new world order of "Flat is the new up" is a pretty solid performance. On the technical side, we're rolling out a number of interesting new things this fall, the most important of which is a web services layer and SaaS version of our Salience engine for smaller companies that need some high end text processing capabilities but don't have the budgets to bring the engine in house. We're using this SaaS service ourselves to roll out a new Excel plug in that will bring lightweight text analytics (entities, themes, sentiment) to anyone with excel, and at a price (<$100/month) that just about anyone can afford.

Submitted by Christine Sierra on Tue, 2009-07-21 04:00

Sentiment is usually categorized into three buckets: positive, negative and neutral. It often get's presented looking something like this:


Sounds pretty simple, right? If content has good words in it, then it's positive. And if the words aren't so nice...well, that can be bad. However, when applying sentiment to any set of content, there is always the chance that what you may think of as good could, in fact, be bad. How about when you automate that process? When we're asked about the "absolute" of automated sentiment, we often use an example from one of our technology customers; they had content that our sentiment engine thought should have be tagged as negative, but was actually positive for them. In their case, they were applying sentiment to product-related content and during the analysis several of the documents included the words "Error Message". In a traditional sentiment situation, anything relating to an "error" would be considered negative, so the engine tagged it as such. After the results were presented, the analysts reviewing the results disagreed with the sentiment engine and concluded that the documents containing "Error Message" were positive. How could that be? Had our automated sentiment gotten it wrong? No. Our software was fine, but since this client believed it to be a good thing that an "Error Message" was thrown when their product failed, they thought of this content as positive. If nothing had been presented in product failure situations, then they would have believed it to be a negative thing. So, something perceived as bad by the software, was in fact good to the client. This is a rare instance, but we use this example to show that sentiment can be subjective, depending on the situation and the content being analyzed. And while automated sentiment helps to expedite the processing time, can be over 80% accurate, and is good in situations where you are weeding out the bulk of neutral content, it is often up to the individual company to dictate how to apply the spectrum of positive - neutral - negative. If you are thinking about applying sentiment to the content you are analyzing, you should know that Lexalytics provides you with both a sentiment score and a confidence score. That is important because it allows you to determine where the good and bad thresholds fall in your world - AND - we let you know how confident we are about our assessment of the good, the bad, and the neutral. When considering sentiment solutions, be wary of the simple red, yellow, green methodology. Without some freedom to move those scales, you may find your analysis will be at the mercy of the technology and you may not always agree with the results.

Submitted by Christine Sierra on Wed, 2009-07-08 04:00

Jeff Catlin provided ZDNet's Jennifer Leggio a guest post on why companies need to be thinking about twitter speak and how it can be analyzed. Check it out here.