There’s a lot of buzz about measuring consumer generated media. A lot of it is fear-based, predicting that those unaware of the buzz around their product will be sunk by bloggers. Some of it is the inevitable hype around any new thing and will just as inevitably die down as CGM becomes a normal part of the world. What most of it seems to lack is any kind of theoretical basis for doing the actual measuring.
A key statistic for most of the measurement plans is the amount of the “blogosphere” is covered by the particular solution under review. No company wishes to be outdone in this arena, and as new forms of content crop up, there is a chase to cover not only more content but a wider variety of it - three years ago “covering” blogs would have been enough to boast about but now any serious solution provider should have a plan to cover YouTube, Facebook (despite the terms of usage making it illegal), discussion forums, consumer review sites, Twitter, and probably some I haven’t yet thought of
The problem with aggregating all of this data is that fundamentally, most of it is of poor quality. The quality problem starts from the bottom - with content scraped from websites, consumed by ad-filled RSS feeds and other sources. Next up the stack is the proliferation of spam blogs, link farms, and other forms of deliberate gaming of the system. Then comes the problem of whether anyone cares about a particular piece of content, even if it is clean and valid content. Finally, how do we measure the content?
There are certain time-tested ways of measuring things where the individual data points are too many or too suspect to assess all of them. I believe that these measures can be applied to CGM, but it requires throwing away the assumptions that all CGM is equal and that anyone should pay attention to the entirety of it. It also means we have to lose the fear that somewhere, someone is saying something nasty or negative. On the internet, tt is not a fear that someone is flaming you, it is a certainty - but it doesn’t necessarily mean anything.
The first step is to get back to the notion of sampling the data as opposed to aggregating all of it. A sample has various characteristics that separate it from just a big glob of stuff - it must be representative, it must be clean, it must remain constant over a period of time, and it must have a metric of quantity, such as how strongly someone holds an opinion or how much disposable income a household has. Market researchers, political pollsters, and traditional PR agencies have known and practiced this for decades. Why should it be abandoned in the face of CGM?
The second step is to return to an old traditional media concept - that of trusted sources of information, or influence. Certain sources of information carry more weight with their readers than others and these should be listened to. Those without trust or influence should be excluded from the sample. This is not a radical claim but because of the lack of any reliable models for influence, the CGM world has been flattened. Ignoring this ignores one of the basic tenets of CGM itself, ironically, as the most successful CGM sites have ways to rate content and its providers. Amazon has “helpful” reviews along with “Top Reviewers” and Digg is a simple popularity poll on stories. Indeed, the problem with CGM metrics is not that there aren’t any, but there are too many, and none are reliable. As unreliable as circulation figures are for print media, those numbers are considerably more grounded in reality that anything that exists for CGM today.
But what does all of this actually mean in practice? I believe it means that people need to get their hands dirty for each and every situation. Different clients have different needs from CGM, and the solution provider needs to be able to understand the space the client is in, help the client get a valid sample of content sources together, and construct a list of trusted information sources. No automated system crawling the blogosphere will accomplish this and produces no more useful information than the client using Google blogsearch and reading a few hundred random articles.