Jump to Navigation

LexaBlog: Our Sentiment about Text Analytics and Social Media

Text Mining in Hotel Reviews: Bally's vs. Bellagio

Hotel Reviews represent one of my favorite uses of text analytics. About five years ago we built a site with FAST that measured hotel reviews to build a “consensus opinion ” of hotels in a narrow geographic area. The idea was to give users of the site (shown below) an idea of what people thought of various hotels in a given area (Manhattan for example). It’s a nice application because it plays to the strengths of sentiment scoring, where a group of reviews are rolled together to form a concensus opinion. Automated engines are very accurate in such a use case (possibly more accurate than people), and they can handle a large volume of content.

hotels_combined_fast.JPG

Recently we revisited the scoring of hotel reviews, and dove a bit deeper this time. Rather than simply generating a score for each property we scored the reviews for various features of the hotel, like location and staff and dining. For this test we used reviews for a couple of hotels in Las Vegas, the Bellagio and Bally’s and we measured the following features for each:
- Rooms
- Price
- Facilities
- Location
- Cleanliness
- Service
- Overall

An important aspect of this analysis is that the hotels are basically in the same location (right across the street from each other). When you examine the results (below), you’ll see that the hotels scored nearly the same on location. This is a good test that the results are indicative of reality.

Hotel_Features.JPG

Digging deeper into the results, I was surprised to see that Bally’s had higher scores than Bellagio because Bellagio is one of the 5-star properties in Vegas, so we dug a bit deeper to make sure we weren’t scoring the reviews wrong. We focused in on the most positive and most negative reviews and tried to figure out why Bellagio wasn’t scoring higher. The chart below shows that the “happy campers” were equally happy with Bellagio and Bally’s the unhappy visitors were really unhappy with the Bellagio.

Most_positive_and_most_negative.JPG

When we dug into the reviews we discovered that people expected more for their money than they were getting at Bellagio. This helped us confirm that the numbers for the properties were right and showed that the use of text analytics in comparing feature reviews for hotels was interesting and useful.

Salience 4.3: Opinion Mining

One of the two major new features in Salience 4.3 (releasing around June 30th) is "opinion mining". Opinion mining expands our core technology to handle indirect quotes. We've been able to extract quote-mark delimited quotes for a while now, and you could perform further analysis on those quotes (which were attached to the speaker).

Opinion mining means that Salience 4.3 can now handle sentences like:
1) Seth then asserted that this was a truly awesome feature.
2) Tim agreed that Bill was unduly angry.
3) Paul explained that the code was broken.

In each case there is a speaker, a topic, and sentiment expressed. The "speaker" is always an entity - and it could be a place, person, or company. The topic can be either a theme or an entity. Sentiment is assigned to the topic.

Thus, in sentence 1:
Speaker: Seth
Topic: awesome feature
Sentiment: positive

Sentence 2:
Speaker: Tim
Topic: Bill
Sentiment: negative

Sentence 3:
Speaker: Paul
Topic: code
Sentiment: negative

How does this work? I'm glad you asked. We have a data directory full of patterns for opinions. These basically come down to the following 3 classes:
1) "attributed" opinions (e.g. Paul said "This is great")
2) "cross sentence" opinions ("This is Great." Said Paul)
3) "unattributed" opinions (the examples above)

Unattributed uses a list of verbs that are expected to express an opinion, and looks for certain patterns using those. There are roughly 200 verbs that clearly express opinion (acknowledge, accuse, add, admit, advise, affirm, allege, answer...) and roughly 200 that have additional requirements because they indicate opinion only in certain contexts: (accept, account for, address, agree, allow, analyze...). To give an example of how this works, consider the following:

"Paul charged at George." vs. "Paul charged that George was incompetent." The "that" in the second sentence changes "charged" from being an action to indicating an opinion.

For those who update to 4.3, check the data directory: /data/opinions/*.ptn for all the patterns.

Try it out... we think that your opinion on this will be positive.

Using queries to recognize entities

Entity extraction in text analytics is the basis of the entire process - identify people, places, companies and themes and use them to better understand the content.

There are two types of entity recognizers that we have used in Salience 4 with much success:

Model based entities: this process looks at language and determines factors like parts of speech and extracts the relevant entity.

Customer driven lists: this process looks at a list provided by a customer to match and extract the relevant entity.

Our soon to be released Salience 4.3 is introducing query-based entities. This process takes into account the combination of words to make a match and extracts the entity based on that query.

For example, there is more than one Senator Udall in the United States. Mark Udall is a Senator from Colorado. Tom Udall is a Senator from New Mexico. If you had a query-based entity recognizer for "Tom Udall" you would create a query that includes the terms "Senator Udall" and "New Mexico" to determine that it must be Senator Tom Udall and not Senator Mark Udall.

While this is often compared with confidence-based entities, this isn't based on a confidence about the language, but an absolute.

Lexalytics Sentiment Spectrum

Sentiment Analysis solutions are popping up everywhere these days - or so it seems. Every day there is a blog post, or Twitter post (or 100), asking how it works or arguing a point about sentiment and what exactly it means.

There has been an increase in articles covering everything from automated solutions vs. human analysis, to accuracy, to processing online content along with traditional content, to analyzing customer conversations. So, as a text analytics provider that has been offering sentiment analysis for years now, we thought it was about time we introduce a guide that organizations could use when they're trying to decide what they need for an analysis solution.

Lexalytics is pleased to share the Lexalytics Sentiment Spectrum. It is a view of all the factors that may come into play when deciding which route is best for your company.

sentiment-spectrum.jpg

Our hope is that by looking at the various factors that go into extracting sentiment analysis, along with the different methods by which it can be implemented, it will become a little less confusing on which process may be best for your organization.

The key questions we believe you need to ask include:

  • Is my data public or private? What type of security do I need on it?
  • Will I require any customization of dictionaries or integration to an existing application?
  • What does it cost?
  • What are my accuracy expectations?
  • Are we processing 100 documents a day or 100 documents a minute?
  • How many sources do I have flowing into my system?
  • Do I want to process online content? In house content? Both?
  • Does the solution give me sentiment of a whole document, or all the things contained within the document?

As we always do, we suggest you talk to a variety of vendors to review these key points and to ask for a possible proof of concept. Sentiment is inherently different for each company depending on what it is they need to analyze and accomplish - and how much human interactions is going to be involved. Some industries can use automated sentiment with little interaction at all and others need additional validation or customization to get the perfect results.

Text Analytics can play in very targeted, scientific enterprises

I came across this great article by Frank Brown Ph. D from Accelrys (a company, I'll be honest, I had never heard of until today) in Information Management. He was describing the world of the scientific enterprise and how smart information management can help to strike a balance between content and context in R&D.

In part, he wrote,

"The term “business intelligence” has risen in the realm of information management for a reason. A collection of letters, numbers, figures or images are meaningless until processed in a way that makes the information understandable and usable. That’s what distinguishes raw data from true intelligence."

I think regardless of industry, people are wondering how to get at silos of information and make them more useful. He continued with,

"But when the available knowledge base includes an enormous breadth of sources, data formats and locations, relying on human processing alone is simply not feasible. This is where emerging technologies such as advanced semantic search and text analytics come in. These types of artificially intelligent categorization tools can help remove the time and cost constraints involved in extracting the context from complex content so that research collaborators can capitalize on all the valuable stores of data available to them – structured and unstructured, proprietary and public."

If text analytics and business intelligence can help with the information management process for companies whose job it is to better the world with drug manufacturing or scientific break-throughs, what do you think it could do for your company?

Textual Analysis of Financial News Stories

I came across an interesting blog post in Technology Review that showcased the Arizona Financial Text system (AZFin Text).

According to the author, Christopher Mims:

"...it works by ingesting large quantities of financial news stories (in initial tests, from Yahoo Finance) along with minute-by-minute stock price data, and then using the former to figure out how to predict the latter. Then it buys, or shorts, every stock it believes will move more than 1% of its current price in the next 20 minutes - and it never holds a stock for longer."

What I find interesting is the spawn of discussions around anything related to trading and automated text analysis - it varies from ethical arguments to accuracy discussions to the human interaction factors.

Regardless, text analysis in financial services is an area that has been progressing for years and seems to have been able to find some traction when applied correctly.

The spread of "lytics"

For over 7 years we've been known as Lexalytics. I'll be honest, I didn't pick the name so I can't take credit. However, I often get asked what it means or where it came from.

To us, it's pretty simple. You take Lexical:

Main Entry: lexical
Function: adjective
Date: 1836
1 : of or relating to words or the vocabulary of a language as distinguished from its grammar and construction
2 : of or relating to a lexicon or to lexicography

and add Analytics and you get Lexalytics. Since we analyze text-based content, or words, to provide additional analysis to customers, it was the perfect name.

Lately, I've seen a little spike in other "lytics" popping up at conferences and online. For example, Social Analytics. Referred to as Socialytics. Good one. Today I saw Community Analytics - Communilytics.

I'm certainly not claiming Lexalytics kicked off the world of "lytics" - that'd be silly - but what does encourage me is the fact that businesses and organizations are investigating different ways to analyze information.

There has been a major transition in the past 5 years of content from scanned, stored and printed to blogged, posted and shared.

As more and more channels are used to create and disseminate content, enterprises will need to explore all the various ways analytics will play into their infrastructure and applications. Some analytics are based on pages and pages of documents published in very specialized industries while some are comments and posts on very public domains.

Whether you are analyzing Pharmaceutical data like our friends at Pharmalytics or simply exploring tweets and social content (Twitterlytics, perhaps?), there is a lot of power and information to be found within content. Try it for yourself. You may find a new form of "lytics" you can share with us all.

Perfect Text Analytics presentation from Text Analytics Summit 2010 in Boston by Seth Redmore

Hi all!

I (Seth) presented this at the 2010 Text Analytics Summit. It's a tour de force (or maybe not, but it's at least a little interesting) presentation about what we think makes for "perfect" text analytics, as well as a discussion on what we're doing over the next 12 months to make our stuff even more perfect.

There are a few notable tidbits in here, like the fact that we're going to be supporting foreign languages. (You have to watch the preso to find out which one we're doing first :) ). Sarcasm! Who hasn't run up against the "Well what do you do about Sarcasm?" question? We have an answer... There's other cool stuff in there too. Check it out!

Text Analytics Summit 2010 recap - A very different experience

Last week I attended the 6th annual Text Analytics Summit in Boston with the Lexalytics team and it was noticeably different from years past. There were fewer vendors in attendance, and while the overall attendance seemed to be down a bit from the year before, there were more end users in attendance than previous years.

So the real question is, was the show a success for us or not? And on that question I’d have to say it was a huge success. For whatever reason we seemed to have the heaviest traffic at our table, and we generated more leads than any previous year, so as a vendor I have to mark the conference as a success.

As to the content of the show, I would have to call it hit and miss. I found the opening keynote from LinkedIn to be really interesting and very well presented. Like the bit.ly presentation (described below), LinkedIn was able to tell an interesting and, more importantly, profitable story about how they are leveraging Text Analytics to mine profiles and their interconnections to help people figure out who else they should be connecting with.

In my opinion, there were way too many presentations focused on Voice of the Customer and opinion mining and not enough on different and novel uses of Text Analytics.

Still, there were some really different presentations, like the legal eDiscovery presentation from Gerald Britton, and I was very happy to see our case study presenter, bit.ly, give an overview of their service and how they were integrating Text Analytics, though I’m not entirely sure the audience totally understood the value proposition that something like a bit.ly brings to the table.

For anyone that didn’t fully grasp the unique value of bit.ly, here’s my oversimplified attempt: bit.ly shortens URL’s for inclusion in short text posts like tweets. The value is that they crawl every shortened URL and keep track of which shortened URL’s are being heavily clicked on. Basically they know what content is important before anyone else on the web. Since they store the content, they have the content and can use text analytics to enhance the metadata and therefore know what the pages are about. Sounds like an advertisers dream to me.

As I said above, I think there was still too much focus on Voice of the Customer at the show, but I’m hopefully that will change going forward because the attendees this year were quite varied in their interests so maybe in the future the presentations will widen out to match the audience.

Can sentiment stand on its own?

I attended the first Sentiment Analysis Symposium in NYC last week and thought it all came off reasonably well.

It was interesting how the show really broke down into two very distinct schools of thought. There were the reputation management and social media monitoring folks that were by and large unhappy with the state of the technology. Then there were all of us technologists that were trying to suggest that there are better uses for sentiment than just in reputation management.

It was also quite entertaining to watch folks (myself included) try and give a useful and informative talk in 5 minutes, but the "lightning talks" did move right along and showed how far the industry has come.

So, on my original question, can sentiment stand on its own? My take on this before this show would have been that sentiment can't stand up by itself as a singular topic of discussion, but given what I saw at the show, and the tremendous amount of traction we've gotten with the BBC around political sentiment I may be proven wrong.

There suddenly seems to be a ton of interest in sentiment in a wide variety of different industries and applications, for example:

  • Financial Services: Sentiment scored news for algorithmic trading
  • Customer satisfaction monitoring of customer feedback data
  • Political campaign monitoring
  • Scoring of hotel reviews

Automated sentiment will never be the equal of human scored sentiment, but its closer than most people realize. When pointed at the right sort of problems (for example, trends in high volumes of data) it provides something humans just can't match.


Syndicate content