LexaBlog

Our Sentiment about Text Analytics and Social Media

Submitted by Mike Marshall on Wed, 2009-02-18 05:00

For a service (I hesitate to call it a business, most businesses at least have a vague idea of how to make money, though more on that later) that I don’t really like I seem to spend a lot of time writing about Twitter but with the news that they have managed to convince yet more VC’s to invest (another $35million apparently) I am drawn back like a moth to a flame. Of course most of the discussion about this latest funding round has revolved around how the investors are ever going to see a return on this investment given the semi mythical nature of a business model. According to the Biz Stone on the official Twitter blog

We are now positioned extremely well to support the accelerating growth of our service, further enable the robust ecosystem sprouting up around Twitter, and yes, to begin building revenue-generating products. Throughout this year and beyond, our small team will grow much bigger to meet the challenges and opportunities ahead.

and to this end they recently hired Kevin Thau to head up the business side of the operation. His plan, use the flow of information as the data feed for an analytics platform. Now from my point of view that sounds like an excellent plan (we had customers ask about mining twitter recently) and its certainly a better one than relying on shaky web advertising models, but will the users of twitter let the data that they are creating be used that way. You just need to look at the explosion of complaints this week when Facebook had the audacity to change their TOS to allow them to keep your content if you leave (they already own all the content when you are on the site, a fact most commentators seemed to have missed) to see how the users of Social Media think you owe them for creating something that they use. How do you think Twitter users are going to react if the same sort of thing happens to them. Could be very messy, definitely something to watch with interest.

Submitted by Mike Marshall on Tue, 2009-02-17 05:00

With customers wanting to process more and more content (and more and more content being available) and wanting to do that in an a near real time manner as possible, the throughput speed of our various components becomes something that we are very interested in. Of course customers always want more features added as well (Pronominal CoReference, Sentiment Concepts etc.) which in turn require more complex engineering solutions (grammatical engines) which in turn require more processing time. Trading off this increased complexity with the need to keep the document throughput up can be a challenging task. My internal metric has always been the ability to process 2 normal sized (1000 -1500 words) documents a second in a single thread. If we can keep to that then scaling can become a hardware problem, add more cores and you get more throughput, but with all the changes we’ve made over the last couple of releases, it was time to see if we were still keeping up with that. To this end, last week I had one of our engineers profile the various bits and pieces that make up Salience internally and we came up with some interesting conclusions

    • Machines based around a Core 2 architecture completely trounce the Pentium 4 architecture with Salience. A 3.4Ghz Pentium D was only 20% faster than a 1.66Ghz Core 2 machine and was slower than a 2.5Ghz Xeon based machine.
    • Perhaps not surprisingly, the 64 bit versions of the software were substantially faster than the 32bit versions with a document taking 338ms on 64bit compared with 458ms on a 32bit platform.
  • Linux and Windows performance was pretty much the same, so no adding to platform debates there Smile

The testing also identified (as we hoped it would) a couple of areas that we could make some more speed improvements in the upcoming release so we should be able to get the numbers down further by the time 4.1 goes out the door. And as for the internal metric. Well as you can see for the numbers above, even before we do some more performance enhancements we are still at the 2 docs a second mark, and on the 64bit platform we are nearly at 3, which all in all makes me happy and a happy Mike is a good thing.

Submitted by Mike Marshall on Fri, 2009-02-13 05:00

If like me you don’t take the whole Social Media thing too seriously, then Being Five is the cartoon series for you. From the talented guy who does Prune Juice you can now get a 5 years old take on the whole phenomenon. Check out the site and make sure you grab the feed. Couple of my personal favourites below - click for the full size versions.

Submitted by Mike Marshall on Thu, 2009-02-12 05:00

Now I’m big fan of the Drama 2.0 blog as I think he generally calls it right about most of the Web 2.0 and Social Media news around, but this time I think he’s got it dead wrong. As I’m sure a lot of you are aware the World Economic Forum took place in Davos and Robert Scoble scored a chat with Facebook Honcho Mark Zuckerberg. Amongst the usual fluff this interesting titbit came out

Facebook is, he told me, studying “sentiment” behavior. It hasn’t yet used that research in its public service yet, but is looking to figure out if people are having a good day or bad day. He said that already his teams are able to sense when nasty news, like stock prices are headed down, is underway. He also told me that the sentiment engine notices a lot of “going out” kinds of messages on Friday afternoon and then notices a lot of “hungover” messages on Saturday morning. He’s not sure where that research will lead. We talked about how sentiment analysis might lead to a new kind of news display in Facebook. Knowing whether a story is positive or negative would let Facebook pick a good selection of both kinds of news, or maybe even let you choose whether you want to see only “happy” news.

Now Drama sees this news as a excuse to poke fun at Zuckerberg but I actually see this as a really big thing and it could just be the final piece of the puzzle that makes automated sentiment analysis an accepted technology in the same way that search has. Of course at the moment Sentiment Analysis is used in several ‘Voice of the Customer’ type applications and also in various trading systems around the world but most of that analysis occurs on news data, which is generally well structured and well written but it hasn’t quite made the breakthrough to being a standard technology. As Jeff alluded to in a previous entry, there is still a lot of smoke and mirrors in the industry and whilst corporate customers get the advantages automated sentiment analysis can often bring to their business, they still have a mental hurdle to get over as to its effectiveness and ROI - of course 10 or 12 years ago Search was in much the same place! If Facebook can get it right and actually show on the site that determining sentiment enables them to do x and y better, then it makes it a win for all of us. The challenge they face is that its data tends to be a small number of words and contains both text speak and spelling errors, which tend to make a mess of automated tagging systems etc. Of course it would be even better if they opened up there (the users?) content and let the rest of us play with it as well, but I realise that’s just a pipedream - ah well maybe one day.

Submitted by Jeff Catlin on Wed, 2009-02-11 05:00

As I discussed in my last post, Reputation Management is one of the bright spots in a pretty gloomy economy. The reason is obvious, getting your brand trashed in this environment could kill your brand. So, if you’re considering a Reputation Management or Social Media Monitoring solution, here are some things to consider:

    1. Where does the content come from? As we know, it all starts with the content. Please, please, please don’t get wowed by the pretty pictures. Real insight comes from looking for patterns and trends in large and varied content sets. Make sure your vendor can tell you how they acquire their mainstream, blog, and social media content. Ask the hard questions about where their data comes from, are there any potential copyright issues that could alter access to this information in the future, and do they have any agreements in place to go after content from the likes of Facebook or MySpace.
    2. Can they customize for your industry? It’s not enough to just monitor mentions of you or your competitors. Insight comes from figuring out what people are worried about, not what your marketing folks think they *might* be worried about. Whether their solution is based on a search engine or a Text Analytics engine, make sure that they can discover what’s driving the discussion in your industry. You need to go beyond measuring the penetration of your marketing message because that may not be what people are talking about.
    3. What’s the sentiment of my brand? Sentiment, it’s the new beige (yes, this is good for us, since we have a sentiment engine). We’ve noticed in the last 12 months that sentiment has become one of those checklist items in brand and social media monitoring. I suspect this is due to the weakening economy, and the ever increasing reach of consumer generated media. Companies have to know if they’re getting trashed out in cyberspace, and because of the volumes, the only way to do this is with an automated sentiment engine. Your vendor may not use our engine, but whatever they use, please make sure that they can measure sentiment at the item (company, brand, product) level. Measuring sentiment at the document level is fine, and may provide valuable insight, but if the content is comparing two brands, then you want to know how each is perceived and document level simply won’t give you that insight in this case.
    4. Can I touch it? The first generation of reputation management systems tended to have large account management teams behind them to build out and manage a customer’s reports. The customer couldn’t go in and adjust the reports themselves because the systems weren’t quite user-friendly. This is fine if you have deep pockets, but in today’s world that’s something none of us have. Many of the newer solutions that are available, or are being built, will allow the customer to build and manage their own reports. This is no small undertaking, but it does provide users a cost effective way to gain the insight that something like Google Reader can’t provide.

Of course it’s not as easy as asking these 4 simple questions, but if you can answer these to your satisfaction, then chances are the folks you’re considering can provide a solution that will help get you started.

Submitted by Jeff Catlin on Fri, 2009-02-06 05:00

Given the continuing stream of bad news that assaults us each and every day… 5000 layed off here, 8000 there, and the political parties battling over what shade of lipstick to apply to the pig, you wouldn’t be alone in feeling down and uncertain. There are however some bright spots out there, and thankfully we seem to be dead in the middle of one of them. There was a very interesting article in PRWeek that we came across today discussing the success that a number of the vendors in Social Media Monitoring are having in this economy, and thankfully some of them are our customers. While this may come as a suprise to some, its something I expected to see, because one thing companies can’t afford to do in this economy is ignore the bad news. They need to know what’s being said about their companies so they can minimize panic and damage around their brand. Of course this means they have to look into more places (blogs, twitter, facebook, etc) and have less people to do the looking (remember the 5000 layoffs at the top of this post), so they are having to look to companies like dna13, Evolve24 and Cymfony to do the digging for them. The PRWeek article spells out how well many of these guys are doing in this tough economy. My general opinion is that the companies that help companies handle the ever increasing glut of information that’s out there are going to thrive in this economy, and thankfully we’re one of those.

Submitted by Christine Sierra on Fri, 2009-02-06 05:00

Got your attention, didn’t it? It’s not true. In fact, I love my job. Smart colleagues. Fantastic software and technology. Fun industry. Great customers. And if you ask me, I’ll tell you that. But not all employees are as happy as me with their leaders. I’d never even heard of JobVent until this morning. Isn’t that what husbands are for? I found it because of this: Executives Increasingly Aware of Online Reputation Management and thought “Finally!” We’ve had Execdex floating around as our demo site for a while now, simply because we knew that C-level executives would have to answer some tough questions. Can you say salary cap? It just goes to prove that reputation management isn’t always about whether someone is happy with your latest gadget or customer service techniques; it goes straight to the top and whether your employees and customers are happy with who is running your company. These days, many are asking questions like, do I have faith this CEO can navigate this unruly economic crisis? Are they respected by their peers? Will they be running the company into bankruptcy in 6 months? I expect more of this “awareness” in 2009, if not panic, among executives. Be aware, be very aware…

Submitted by Carl Lambrecht on Thu, 2009-02-05 05:00

I’m a techie at heart. And there are few things that interest a techie more than the shiny new things. This week’s shiny new thing of course is Google Latitude. The basic idea is that your location can now be shown on Google Maps through geolocating your IP address or smart phone and shared with your friends. One initial response to this release has been the concerns about privacy. On Wednesday evening’s ABC News, John Berman reported on the “privacy paradox going mobile”, raising the possibility that “there are times when you don’t want parents or bosses knowing where you are, in the middle of the day, when you’re supposed to be writing a story” as he walks into a movie theater. Mr. Berman does go on to describe the controls that Google has put in to restrict who can see your location and to what accuracy. To me, it all goes back to the core struggle in social networking, which is how much information do you want to give to your network of friends, and how close-knit do you keep that network. Some things that Latitude is not going to let someone do, if Google keeps to their statement about sensitivity of location data:

  1. Display where you have been, only where you are right now.
  2. Display where you are to anyone that you have not previously expressly stated you wanted to give that information to.

How this is going to benefit the social networking world, in a way is yet to be seen. Currently I have my location shared with one friend, who is 200 miles away in New York City. And if my boss wanted to know where I was at, he probably wouldn’t look on Latitude to see where my smartphone was located, he would just call me on it. I hope it is something more useful than helping me find friends close to me that want to grab lunch or a latte, which seems to be the main benefit to a similar application for the iPhone called loopt. Jordan McCollum’s “Stalk Your Friends with Google’s Latitude” asks the question, “Will you be using Latitude?” But I think the more interesting question is what will you be using it for?

Submitted by Christine Sierra on Tue, 2009-02-03 05:00

John Harney at KMWorld put together a great article recently with Seth Grimes, Fern Halper and Sue Feldman to provide readers with an overview of why and how they would use text analytics with unstructured content. They pointed out these various applications for text analytics:

    • Business intelligence
    • Voice of the customer
    • Sentiment analysis
    • Intelligence (Counter-terrorism)
    • Life sciences
    • Executive search (hiring)
  • E-discovery

The last application that I would add to this list is Enterprise Search. Text analytics can improve the search process by extracting key metadata from documents and other unstructured information. We know search is great if you know the questions you want to ask, but if you are trying to uncover valuable information because you don’t know the question, text analytics plays a key role in that discovery process. In addition, coupled with Voice of the customer and Sentiment analysis is the often over-used term Reputation Management. This takes into account any and all means by which a company wants to monitor and manage the brand image of their company (or person). This can come by the way of customers, influencers, bloggers, twitterers (did I just make up that word?) and other media outlets and relies heavily on sentiment analysis, product (entity) extraction and human interaction.

Submitted by Carl Lambrecht on Mon, 2009-02-02 05:00

The quick answer is “No, you don’t.” But as you would expect, it’s more complicated than that. You do own your online reputation, to the degree that you are actively tracking and managing it. There are solutions out there, such as Evolve24 or Cymfony, that help you manage what is being said about you and your brand, and there are companies that are reacting instantly to their unsatisfied customer. I noticed a guest post this morning on ChrisBrogan.com on this topic, and it contrasted and meshed interestingly with another article written about Cash4Gold from Insider Marketing. First, let’s look at a case of active reputation management by SanDisk in the guest posting. A single grumpy tweet on Twitter was picked up by SanDisk and immediately responded to. What did this result in? A customer issue, resulting in bad sentiment, responded to quickly with a satisfactory response by the company, resulting in a larger amount of positive sentiment. “My original gritching about my old player reached my 600+ followers. When I tweeted about the replacement and SanDisk’s customer service, the news was retweeted to thousands.” Not only that, but SanDisk is now also attached to a guest posting on a very visible blog praising their attentiveness to their customers and their active reputation management. The quick, customer-service driven response allowed SanDisk to manage their online reputation and avoid any additional negative reactions. Cash4Gold appeared to go a different route in managing their online reputation. They noticed their “product” was shed in a negative light by a blogger and they tried this tactic – ask the blogger to lessen the blow and at the same time offer a monetary donation (to him or a charity of his choice) in return for removing the negative phrases. Ironically, that approach brought more attention to the apparent negative sentiment of the company and they may need to think hard about their approach. In any case, it is clear that consumers, influencers, tweeters, and bloggers all hold some control in the reputation game. The tools exist for tracking their input; it is up to you to find the way that works best for you in responding to that sentiment before it does more damage than good.