Discovering relevant concepts in hotel reviews

  3 m, 24 s

In an earlier post, Jeff Catlin described analysis that we did around Bally’s vs. Bellagio using publicly available customer reviews. We did this analysis using something called “categories”, which is basically a fancy name for search strings. The important aspect of this analysis was in finding the sentiment associated with different important aspects of the hotel experience – using a known set of categories. In other words, the actual terms for each of these categories were defined, and then we determined the sentiment of each category.

This technique is highly useful for ongoing monitoring of known service areas, and provides highly reliable recall of those areas. Where it is less useful is for discovery applications – where you aren’t comparing on known areas, but instead trying to find out what’s outside of your analysis area. Discovery is inherently a different process from monitoring. Discovery means that you don’t know what might or might not be out there. In monitoring applications, it’s generally important to have higher accuracy and recall, so that you can correctly monitor trends. In discovery applications, accuracy and recall are paradoxically less important – largely because they’re not really relevant concepts. You can’t define accuracy or recall for things that you don’t know about. Once you define them, then you can examine the statistical rigor of your system. But, I digress.

We took sample data from a publicly available review site and extracted the relevant themes. The following table shows the name of the institution, a sentiment score (for the theme), and the theme itself. It is important to differentiate between the sentiment score for the theme and for the entities resident in the text – consider the following sentence: “President Barak Obama is doing a great job with this awful oil spill.” Now, whether you agree with that statement or not, you can see how the entity “President Barak Obama” would get a positive score, while “oil spill” would get a negative score.

Hotel BLOOM! Positive trendy hotel
Hotel BLOOM! Positive been decorated
Hotel BLOOM! Positive not mean
Hotel BLOOM! Positive main shopping
Hotel BLOOM! Positive botanical gardens
Hotel BLOOM! Positive centrally located
Hotel BLOOM! Negative long flight
Hotel BLOOM! Negative next day
Hotel BLOOM! Negative not sure
Hotel BLOOM! Negative ordinary coffee
Hotel BLOOM! Negative stale smoke
Hotel BLOOM! Negative not require
Hotel BLOOM! Negative been given
Hotel BLOOM! Negative never stay

Some of these themes are clearly more useful than others. It is important to also note that the sentiment of the theme is not necessarily included in the theme itself – it is inferred from the language around the theme itself. If I was coming into this cold, and didn’t have anything defined, I would be sure to track themes around “decor” and “location” on the positive side, and “smoking” and “coffee” on the other side. the other themes would potentially mean more when referenced to the text itself, but this is more just to give some quick examples.

Here’s another example:

W Seattle Positive well decorated
W Seattle Positive romantic getaway
W Seattle Positive tastefully decorated
W Seattle Positive centrally located
W Seattle Positive even walked
W Seattle Positive top quality
W Seattle Negative wireless connection
W Seattle Negative outrageous prices
W Seattle Negative local taxes
W Seattle Negative even get
W Seattle Negative poorly lit
W Seattle Negative never stay


Again, the location seems to be nice and decor is also well taken care of. On the negative side, I would watch for complaints around communication issues and pricing. However, this is the W, so what they need to watch for (as we demonstrated in the post) was that they’re providing value for money. If you’re not providing a proportionately better experience relative to the cost difference, guests are going to end up being dissatisfied. There’s more detail to how themes are determined to be relevant, but this is enough information to give you a taste of how to use themes to discover what you should be looking for (or looking out for!).

Categories: Categorization, Natural Language Processing, Sentiment Analysis, Text Analytics, Topic Extraction