Facebook Graph Search = Natural Language Search

  9 m, 34 s

Facebook’s new Graph Search feature has been a lightning rod for grand predictions since it was announced earlier this year. The new social search function has been described as everything from revolutionary to completely doomed. The truth, as in most cases, lies somewhere in the middle. We’re going to throw our hat in the ring on this one and say that, while the new search is cool, without more work on understanding what’s in the content itself, the new search isn’t going to change much.

Before Graph Search, Facebook search was limited to finding people and pages that match particular keywords. It didn’t leverage any of the oodles of information they have on what people were “liking” or the content of their status updates.   Graph Search is Facebook’s first attempt at actually taking advantage of all the information that we’re happily giving them in the form of interaction with our friends and with various organizations on Facebook.

What is especially exciting about Facebook’s claims for Graph Search is the integration of natural language processing allowing for searches that understand far more than just keywords and providing better results based on profile and page information.

This has been no small endeavor. Facebook Graph Search uses Facebook’s Unicorn search system to cross reference different “nodes”. Currently these nodes include people, pages, events, applications, groups, places, check-ins, and objects with location information attached to them. This cross-referencing uses your Facebook friends as its source of data, and can even narrow down search results based on friends attributes. For instance, searching “my friends who like The Who and who are under 35 and who live in California” shows you only friends who meet those criteria (a disappointing six friends, in my case).

The promise of natural language processing is in enabling the search system to understand a variety of inputs, ascertaining the user’s needs, and outputting results that meet that need.. The order in which you place the different search parameters doesn’t matter, and the search engine understands several different ways of phrasing the same request. “My friends under 35 who live in California and who like The Who” will generate the same results as the first example, although the search in the search bar will correct to “My friends who are younger than 35 and like The Who and live in California”, in order to help the user acclimate to the most efficient language to use when searching.

There are drawbacks, however. Facebook Graph Search might, for many, be less immediately intuitive as it sounds. The proximity to the way that we normally speak might even be frustrating when the specific way we phrase a search is not understood by Graph Search.  In other words, while Facebook is claiming that this is a “natural language system”, it’s really not.  They recognize certain additional operators, but it’s not terribly flexible.

They’ve also taken the wise step of backing up their search system with Microsoft’s Bing, so that queries that the system doesn’t know how to handle return search results from the Bing search system instead.

Take the following examples of truly “natural language” queries:

  • “I want to go to a restaurant near me.”   This query should take your desire into account, and optimally, should take into account restaurants that are open right now, as well as the restaurants that your friends have “liked”.  This search fails over to Bing web search, and gives a list of sites where you can actually find a restaurant near you.
  • “I need a haircut” or “haircut” also goes straight to Bing.
  • “What bars are the best in San Francisco?”  You guessed it.  Straight to Bing.

Let’s talk more about what does work, and how we at Lexalytics would classify the Graph Search system.

Here’s the results of a search for “people who like Dubstep”.  If you’re not familiar, it’s a genre of electronic music that’s characterized by a “wobble” bass line (filters with an LFO) and typically has some sort of “drop” where people freak out on the dance floor.


The first clue to what Graph Search actually “is” is on the right side, where the filtering pane is located.  These are the various filters that you can further apply to narrow down your search.  This looks much more like a “faceted” search than a true natural language search.  I would classify it as keyword search with additional operators, some synonym processing , and search facets for filtering. That is a pretty long ways from real “natural language search”.

I’m not hating on Facebook, I love it and use it, but as a marketing professional in the natural language space, I take great pride in correctly characterizing our products.   However, I’m probably a bit too close to this, and “keyword search with additional operators” doesn’t have quite the same ring to it as “natural language search”, does it?

Enough about the front end, let’s talk about the data. Search systems are deeply dependent on the data that they have access to.  Facebook has been struggling with issues around privacy and sharing almost since their birth.   Graph searches rely completely on access to shared information.  If users don’t share their info and likes, Facebook has nothing to work with.  As the short-lived Tumblr Actual Facebook Graph Searches so succinctly points out, those who are not careful or aware of Facebook privacy settings leave themselves vulnerable to searches that can range from humorous, politically dangerous, and in some cases just plain creepy. But if privacy settings among Facebook users tighten, the amount of data available will shrink and it limit Graph Search’s usefulness tremendously.

Facebook is relying heavily on the “like” button to provide information for Graph Search. Up until now, “liking” pages and checking in to places served no real, practical purpose, rendering them only a form of self-expression. The new utility of these categories will have to create a different understanding and use of those tools, or be rendered ineffective. The problem is that “likes” do not accurately represent sentiment about people, groups, and pages. For instance, the Facebook page for Amy’s Baking Company, the restaurant made infamous both by their appearance on Kitchen Nightmares and by their widely publicized social media meltdown, gained tens of thousands of “likes” for days after they gained internet popularity while simultaneously, their Yelp rating tanked to 1.4 stars. “Likes” can’t really be used as a measure of how a user feels about a certain topic, and when corporations are actually in the business of purchasing “likes”, as Nicholas Carlson explains in Facebook’s Seach Is Based On A ‘Con’, it’s possible that Graph Search is indeed destined to be useless and inaccurate.

Different pages also present obstacles to the new search feature. Because nodes are based on pages that are user created, one basic idea might have upwards of five or six slightly different pages available to “like”. This specificity was ideal for those looking to define themselves in some way through their “like” activity, but provides a very real problem for Graph Search. If I want to create a running group by searching nearby friends who like “Running”, I may miss all of my friends who liked the ”Marathons” page instead.  It is unclear how much semantic understanding (even at the level of synonym relationships) is present in Graph Search.

We also believe that there is a lot of opportunity inherent in the status updates that people are making and in the text on the pages that are being liked or content that’s being shared that currently isn’t being utilized.  Our customer Bitly is an excellent example of what can be done with text analysis in the context of content sharing.  They collect terabytes of data from URLs that users shorten and share on Twitter.  They are able to tell their customers about the content of those web pages, from “who” is being discussed, “what” is the context, and any sentiment towards the context or entities.  A similar system would be highly useful for Graph Search – one that looked at the meaning of status updates to try and ascertain whether someone likes something without the “like” tag, or maybe even if someone has “liked” something not because they really like it, but instead because they want to follow the train wreck (we’re looking at you, Amy’s Baking Company).

Expanding a bit on the dubstep example, let’s see what the Facebook page “Dubstep” tells us:

Facebook's response to a search for

This is actually just from Wikipedia, but let’s treat it as regular text for right now, and throw it into Salience.

Our newly release automatic document categorization does a 2-level classification, with about 125 top level categories, and about 4,000 second level categories.

Dubstep classifies like this (in the parens are scores)

  • Music (1)
    • Electronica (.57)
    • Indie_rock (.55)
    • Hardcore_techno (.52)
  • Dance (.57)

I’m dubious about the “indie_rock” sub category, but it’s certainly close enough with “dance” and “music” as top level categories.

Taking a different example, let’s look at classical music:

About Classical Music

Classical music classifies as:

  • Music (1)
    • Music (.42)
    • Music_cognition (.36)
    • Ornamentation (.35)
    • Organology (.33)
    • Baroque_Instruments (.30)


Note the “baroque_instruments” bit, and the lack of a classification for “dance”.   Classical is generally not considered “dance” music.

This is really useful information.  Imagine a Graph Search where you can ask for “friends who like dance music”.   And, yes, I know that there is a “dance music” page, but this is where some of the crafty text analysis comes in.  People who have “liked” the dance music page are not necessarily a super-set of people who have liked dubstep or trance or psytrance or house or gabber or happy hardcore.   These are all dance music genres, and I want to find out who likes dance music.   A good classification system (like what we have in Salience) can give you this sort of association between members of a set, and this is where just relying on what’s been “liked” falls short.

Also, where’s the dislike?   I understand why Facebook chose not to have this, but, I certainly have “friends” who say things like “I hate electronic music!”  (Well, ok, I don’t really, but work with me here.)    Well, now, we can see negative sentiment in the status update, along with classifications of “music, electronica” – that’s good information as to who not to invite over to your house music marathon.  Or “I hated Amy’s Coffee House”, or “Isn’t this article stupid?” – there’s explicit sentiment and context information there that would be useful for graph search.   Even if you wanted to stick to the positive, a simple “I love this place!” is useful.  I personally don’t “like” things very often, but I will comment on them, and that’s information that Graph Search can use to make better recommendations.  I

Graph Search may prove a boon to marketers, who can use the new features to create far more specific target demographics, and so tailor advertisements even further. The ability to search for places “nearby” will undoubtedly boost local advertising, and make Facebook a bigger marketing priority for local business owners. However, it’s not really “natural language search”, and there’s a lot of value that can be added to it with something like Lexalytics Salience Engine to pull out more details and similarities of pages and status updates.
**Coauthored by Mekkin Bjarnadottir

Categories: Analysis, Natural Language Processing, Social Media