Posted by Christine Sierra on Mon, Feb 08, 2010
Jeff recently shared his thoughts on Text Analytics Market Growth with Seth Grimes for his report on B-Eye Network: Text Analytics Opportunities and Challenges for 2010 (free registration required).
It's a well written report, outlining the thoughts and expectations for 2010 from industry leaders and innovators in the Text Analytics industry.
Here is Jeff's excerpt, providing some insight on how Text Analytics and Sentiment will play out in 2010:
Market Growth
Lexalytics CEO Jeff Catlin affirms other respondents' themes. He sees search as a particular growth area and makes other points regarding text-analytics market growth:
• Text analytics [TA] will become a mainstream feature set in enterprise search applications (though not by name). We've seen a steady march toward this in 2009, and it's most notable in how accepted TA features are by the general public. When I'm at a party now and tell someone what we do, they say "Oh yeah, I read something about that sort of stuff last month" as opposed to the "Huh???" that I used to get. The effect of this is that there are a lot more opportunities for TA in enterprise applications, and I suspect it will mean that one or two of the players may get picked up by a big company.
• Sentiment will complete its transition to a "checklist" feature that everyone who works in this space will have to provide. All of the vendors (big and small) will claim to have sentiment. The consumers of this technology will also get a bit more educated - we're seeing this in RFP requests for particular capabilities of sentiment - which will help separate the wheat from the chaff. Unfortunately for us, sentiment won't be a totally differentiating feature that you can hang a business on anymore, as there will be lots of competition on the sentiment front.
• The [differentiation between] larger TA players and the niche players will become even more obvious. The bigger players will integrate a number of useful and useable semantic features into their engines which will help with things like ad hoc classification, concept roll-up, and relationship [extraction].
• On the business side, we expect 2010 to be a "Home Run" year for all the TA vendors with growth rates of 75% to 200% not out of the norm. This is partly due to the mainstreaming of the technology, which is opening up a lot of additional verticals.
Posted by Christine Sierra on Fri, Jan 22, 2010
About a year ago, our CTO Mike Marshall did some accuracy testing on sentiment using our software. This wasn't so much to showcase Lexalytics capabilities as it was to show that accuracy using automated sentiment can be helpful in the business process if done correctly.
One thing we do know for sure is that computers don't change their minds about the sentiment for a certain piece of text. If you run the same piece of text through the software 100 times, it will come back with the same results every time. Humans, on the other hand, have the capacity to change their minds - and disagree with each other - on the same piece of text. But that's okay.
At Lexalytics we've never suggested you take human analysis out of the equation when it comes to analyzing unstructured content. In fact, our hope has always been to help the humans be more productive. Removing the neutral content is the goal, so the focus can be on the extremes within the content - the really positive or the really negative.
I was recently surprised by a statement recently from Forrester Principal Analyst Suresh Vital that "in talking to clients who have deployed some form of sentiment analysis, accuracy rests at about 50 percent."
If this were to be true in our client base, we'd sadly be out of business. I hope as more and more companies enter into the sentiment analysis arena that they continue to test and retest their models.
Below is Mike's analysis from earlier in 2009:
Experience has also shown us that human analysts tend to agree about 80% of the time, which means that you are always going to find documents that you disagree with the machine on.
However, having said all that, customers still like to be told a base line number, it's human nature after all to want to know how something will perform, so I thought I would do a little test using the new model based system on a known set of data. As recommended on the Text Analytics mailing list I used the Movie Review Data put together by Pang and Lee for their various sentiment papers.
This data consists of 2000 documents (1000 positive, 1000 negative) and I sliced it into a training set consisting of 1800 documents (900 positive and 900 negative) and a test set consisting of the remaining 200. It took about 45 seconds to train the model and then I ran the test set against it (using a quick PHP script). Now bearing in mind this is still experimental and that we plan to make more tweaks to the model, I was pleasantly surprised (ok I was more than pleasantly surprised) at the results. Our overall accuracy was 81.5% with 81 of the positive documents being correctly identified and 82 of the negative ones. This is right in the magic space for human agreement.
For fun, I then ran the same 200 test set documents against our phrase based sentiment system, expecting a far lower score, but again we performed better than I thought scoring 70.5% accuracy. With a domain specific dictionary I'm sure that that score could be pushed up towards 80% as well.
So what does all that tell us? Well, it tells us that for specific domain sets you can get very high accuracy levels, though if you ran say, financial content against the movie trained database the results would be far different.
It also tells us that the phrase based sentiment technique produces good results even in its base state against a wide range of content sources (we normally are processing news-related data after all).
So, would you agree?
Posted by Christine Sierra on Wed, Dec 16, 2009
Wow, it has been a while since we blogged. That's bad. Sorry about that, but we've had some new developments coming out of the company, literally.
First, we helped launch a new subscription-based product called Lexascope - www.lexascope.com - and it is powered by Lexalytics' Salience technology. We previewed this in October at the Inbound Marketing Summit and are pleased to have it avaiable for download.
It's still in beta, but we are encouraging everyone to take a look and sign up for the 15 day FREE trial. Let us know what you think. The first desktop application is geared towards PR professionals who want to discover entities, themes and sentiment from RSS feeds or online content.
In addition, we're working on a new product called Lexalytics Cascade.
Cascade attacks a class of problems not addressed by our Salience product, namely stateful content processing. It provides users the ability to look across a collection of content and perform tasks like content filtering, document similarity and collection level theme rollup. In the initial release, we focused on providing a high performance filtering engine to bucket content via user-defined queries, and to provide a scalable document similarity engine capable of measuring the similarity of terabytes of documents.
If you're thinking, "exactly what is a filtering engine?" The easiest way to imagine it is to think about it as a search engine standing on its head. In a filtering engine, the queries are indexed and the documents flow across the engine and act more like queries, where the documents are "filtered" into buckets represented by the indexed queries.
Filtering engines are designed to operate against live flows of content like newsfeeds or twitter streams. The advantage of a filtering engine over a search engine to bucket content is simple, PERFORMANCE. A filtering engine can filter hundreds of thousands of documents per hour. This capability combined with the engines similarity capabilities means that you can process large flows of content with Cascade, and identify duplicates and/or syndicated documents in your document stream.
Throughout 2010 Lexalytics will further enhance Cascade with a series of new releases focusing on the aggregation and rollup of concepts across whole collections of content.
We hope you forgive us for being gone for so long. As we approach the holidays we want to wish you and your families all the best for a healthy and happy holiday, and look forward to more (frequent) posts in 2010.
~The team at Lexalytics
Posted by Jeff Catlin on Fri, Nov 06, 2009
Historically, Lexalytics has been very focused on the enterprise software market by building products that are easy to install, configure and get running. (Don't worry, we're not abandoning that model.)
However, over the last 6 months or so we've been focused on the idea of building out a number of web based services aimed at extending our reach. In doing so, we made the decision to use Amazon's cloud services (EC2 and S3).
The power and cost efficiency of these services has allowed us to build not just one, but two new services that will open up our abilities to a whole new group of users. But this post isn't about our new services (we'll share more about that in the coming week), it's really about the cloud. I'm a big-time convert to cloud computing and believe Larry Ellison got this one wrong.
For businesses, deploying new web based services just got a whole lot easier, and a whole lot less expensive. For example, we will roll out a web API to our core Salience Engine soon and we've been able to do it without having to fight the "co-lo wars". In addition, we saved by not adding IT staff to the payroll to make sure the machines are up, configured and maintained.
Wrapping up Salience for Amazon's cloud and rolling it out took less than a month and has cost us very little. What this means for prospective clients is that they can access our API through the web at a price that will be very attractive, particularly to smaller companies that aren't trying to "boil the ocean" of data.
Simply put, this quick turnaround was all possible because Amazon is handling all the complicated machine maintenance and has offered machine cycles and storage at a very appealing price.
Now that I'm a cloud convert, I expect us to roll out more cloud based products in the next year.
What experiences have you had with the cloud? I'd love to hear your feedback on how it has worked, or hasn't worked, for your business.
Posted by Christine Sierra on Fri, Oct 23, 2009
Recently, Jeff spent some time speaking with Jason Falls (@jasonfalls) of www.socialmediaexplorer.com and they discussed natural language processing, social media monitoring, Lexalytics' next product line Lexascope and more.
Check it out here to see and hear more: http://www.socialmediaexplorer.com/2009/10/23/understanding-natural-language-processing-for-social-media-monitoring/
Posted by Christine Sierra on Tue, Oct 06, 2009
If you've got tweets, we've got sentiment. And themes. And most mentioned people. And spam lists.In fact, the only issue we've run into is that Twitter won't give us all the data to analyze. All 100 gazillion tweets would be fascinating to analyze automatically, but they just don't seem to be there yet. Or, perhaps, they are building out their revenue model to sell us the data. Either way, don't fret. Just like in reputation management where analyzing every single document can be both time consuming and incredibly inefficient, the same holds true for tweets. The average of the sentiment is often greater than the individual tweet.
As our CEO Jeff Catlin mentioned recently on ZDNet :
"Sentiment measurement is at the forefront of much business analysis these days, but in some ways Twitter seems as if it was designed from the ground up to defeat any automated sentiment engine. For instance, there isn't much sentence structure in tweets, and what's there is often wrong. And many of the tweets are just tinyurl or bit.ly links with absolutely no content contained in the URL itself.
Given these challenges, is monitoring and measuring sentiment in Twitter a hopeless chore? Fortunately the answer is No. Even though there are some challenges to automated scoring of Twitter content, there are also some advantages to processing tweets and in particular the tone within Twitter.
The beauty of Twitter is that there is very little grey area in tweets. You're either posting some source of information, posting an opinion you have, or replying to another informative or opinion-oriented tweet."
With the volumes of online data growing at an unbelievable rate, decreasing processing time and implementing automation become key to getting the job done. And from that automation process comes incredible value such as all the associated concepts and themes with a particular topic. Not just the ones with the most hashtags associated with them. And who is talking about those topics? And who else is mentioned with those topics? The value is not always in the number of mentions, while in some aspects that is helpful, but with the context surrounding the tweets and how businesses can use them.
In the coming days we will be attending the Inbound Marketing Summit in Boston where we'll have a demo of our twitter topic tracking system available. We aren't formally releasing a site, or promoting a new product, but we are welcome to conversations about the topic of Twitter Topics and what is useful and what is just fluff. Text analytics doesn't just have to be about processing word documents and research reports - it is just as helpful in processing tweets, customer comments and smaller documents as well.
Posted by Christine Sierra on Fri, Oct 02, 2009
It seems lately that there are more and more companies offering sentiment solutions to a variety of markets. Everything from health care to customer service to financial services and reputation management. But in spite of this, very few prospects seem to really understand what the technology will and won't do for them. Let's start with some basic questions to help you understand more about sentiment:
What does it mean to measure sentiment? How do I know if I really need to use it? That depends entirely on the intentions of the user and the content being measured. If you're looking at customer review data (let's say hotel reviews in this case), then you may be interested in the sentiment of each review for the hotel. Were people happy with their stay at this hotel? This would be an example of document sentiment. It would tell you if the overall review was good or bad, and offer little insight to the details of each review. In this case, processing large amounts of data about the same topic works well.
If, however, you're reading a publication like Consumer Reports, then you're probably thinking more about how the different hotels stack up against one another. You'd like to do some comparison. In this case, the overall document sentiment wouldn't be of much help because the document will have some good and some bad content mixed within it. In fact, what the reader really cares about in this kind of content is the tone for each specific hotel that's being described in the document and the reasons why. Were the beds comfy? How was the shower pressure? Is the staff friendly? In some cases the beds may have been comfortable but the staff rude, which can sway the sentiment of a review. Depending on what is important to you, you'd want to extract the sentiment of each entity. This is known as entity-level sentiment.
What really matters in sentiment analysis? Is it the accuracy or the automation? Again, it depends on your needs and goals for using sentiment analysis.
An example we often use where a technology-based automated solution really shines is in financial services where the trends across a collection of stories are what users are most interested in. They care less about the accuracy of every document detail, and more about the sentiment across a corpus of data that needs to be processed quickly. Financial Services is definitely one of the up and coming industrial uses of sentiment because the technology tends to perform better than humans in processing large collections of content.
Reputation Management is another industry where automated sentiment analysis shines bright, but where accuracy comes under more scrutiny. It could be said that automated sentiment analysis was born in this space, and was invented because of the amount of time people spent hand measuring the tone around products and brands. While Reputation Management is currently the biggest market for the technology, it's probably not the best example of accuracy. It's hard enough to get humans to agree with humans on the tone for a specific story, but to get people to agree with a computer is even harder. I bring up these two contrasting uses because it's important for people to think about their specific needs and requirements before they jump into using any vendor's solution. Make sure the solution you're looking at is well-suited for the problem you're trying to solve.
So while there are more claims of sentiment analysis hitting the market, and after 6 years as a company processing unstructured text and watching online content take hold, it's interesting to see how sentiment appears to be somewhat of a commodity. It challenges all the providers to do a better job in all aspects of the technology. However, it's a fact that analysis of good, bad and neutral isn't as easy as 1,2,3. Ask for a proof of concept before making a decision and make sure the solution is right for you and your business.
Posted by Jeff Catlin on Tue, Sep 08, 2009
Ever since the New York Time's article about sentiment scoring, published a couple of weeks ago, there has been a pretty constant stream of people jumping in and demonizing automated sentiment or trying to pedal its eventual takeover of the free world. It felt a lot like political media coverage to me, lots of opinions but very few of them taking an honest look at the real problems and real solutions. Kudos to Nathan Gilliatt for putting out the list of many of these posts (see the list here ).
It's obvious that we all have axes to grind and software or services to sell, but focusing on the accuracy of automated sentiment is the wrong place to go. To remove any doubts let me state for the record that a machine based system will never score any random piece of social content as well as a human will. People simply have too much context in their brains, and there is no way a machine is going to match that. Given the preceding, one my ask whether I believe that automated sentiment is doomed to failure? The answer is NO, we need to use automated sentiment in ways that you can't provide with humans. I'll illustrate my point by focusing on one of the recent posts about sentiment, Sentiment analysis for online content: Honest? from CyTRAP Labs. The post wasn't particularly favorable to automated sentiment, but was one of the first posts I've read that asked the right question...Is this story relevant?
If you're monitoring social media sources, then what you really want to know is: What's happening that I need to worry about? Text Analytics and automated sentiment is very good at answering these "trend spotting" questions. In fact, machines are an essential piece in providing trend spotting. Sentiment Analysis has made rapid inroads in the financial services industry because users don't care about the tone of each story, they care about the effect of a bunch of stories on the market as a whole. A number of our financial services customers are making lots of money trading equities based in part on sentiment trends, and that should say something about its validity. If you're job is to monitor a brand in social media then the trends and patterns are what you should be worried about, and automated analysis is great for this.
If an automated system is only 70% accurate, it's still going to get the overall trend (up or down) correct for a given brand, and then the humans should always step in and provide the detailed analysis of that trend, including the identification and correction of the posts where the machine got it wrong. Let automated sentiment point the way, but trust humans to provide the detailed analysis that requires a few neurons. Automated systems will never beat humans on a story by story basis, so let's stop worrying about that and use them to provide services that humans can't afford to do.
Posted by Christine Sierra on Wed, Aug 26, 2009
Learn how Search and Text Analytics fit together in the enterprise.
**Courtesy of Network World Video Library -
NetworkWorld.tv
Posted by Christine Sierra on Tue, Aug 25, 2009
I'm very excited. Yesterday, the New York Times published a piece: Mining the Web for Feelings, Not Fact. Alex Wright did a nice job highlighting the uses and benefits of sentiment analysis, particularly how it fits into the search world through sites like Newssift.com, which uses Endeca and Lexalytics as part of its platform, and ScoutLabs for reputation management.
The article also highlighted some of the limitations, which prompted Andy Beal at The Marketing Pilgrim to question the accuracy with, "Why sentiment analysis is about as reliable as a canary in a coal mine", which generated additional comments around automated sentiment analysis software.
Whether your position is that sentiment analysis can or can't help your brand positioning, customer service or reputation management efforts, it was still great to see so many people talking about the capabilities and uses in various industries. There were several discussions and lively debate born out of the article.
One fallout that I did notice, particularly online, was the assumption that online sentiment analysis companies are growing in number, and that it is an increasingly crowded market.
At Lexalytics, we don't necessarily see it the same way. We are absolutely seeing an increase in solutions and platforms interested in integrating our sentiment analysis, but there are still only a handful of providers (us being one of them, obviously) who can offer the technology. Many of the solutions profiled on the market do a fantastic job at gathering and presenting the results, which is critical when you think about how much information floating around online is written by YOUR customers. But that doesn't necessarily mean that there are more sentiment analysis providers out there - the core analysis engines traditionally sit behind the scenes and do their thing and there are still only a few of us on the market.
After 6 years of honing and refining the software used to provide entity-level sentiment, we're excited that the markets are trying to find new and innovative ways to use sentiment analysis, including solutions in the reputation management, social media, financial services, enterprise search and customer satisfaction industries.