Parsey McParseface

  5 m, 11 s

When people ask me what I do for work, I like to use the explanation “I teach computers how to read.” I could say that I work in developing NLP and text analytics technology, but that sometimes gets a blank stare and an awkward sip of the drink before moving on to talking about sports. But there’s starting to be a bit more awareness and discussion about NLP and text analytics. There’s been a steady buzz about chatbots, AI, and in general commercial applications in which computers are trained to understand human-generated communication.

This recent flurry of activity makes life very exciting for those of us in this business, because these are the challenges we’ve been working on for years. At the same time, there’s a need to find the music in the noise and for us to understand what other folks in the industry are doing and how our approaches align with theirs. One of the recent advances that caught my attention was Google’s “Parsey McParseface…” What the heck is that?

Google has been making advances in machine-learning techniques with their Tensorflow framework, and earlier this month announced the availability of an open-source language framework implemented in Tensorflow called SyntaxNet. As a demonstration of the capabilities of SyntaxNet, Google developed an English parser called Parsey McParseface. The announcement on the Google Research blog is an interesting read because it does a good job of explaining why this work is difficult, and difficult to get right.

At this point, we’ve got a complex machine-learning framework, and we’ve got an open-source language understanding framework, and we’ve got an English language parser with a funny name. What can we do with all this? Let’s get our hands dirty and take Parsey for a spin. I chose the “easy” route and downloaded an image of SyntaxNet to build and run in Docker. This gives me a virtual machine set up with all the bits to try out Parsey. My first step was to run the example given in the documentation:

echo ‘Bob brought the pizza to Alice.’ | syntaxnet/

Sending individual bits of text into a demo shell script is a bit clunky, but that’s guaranteed to improve over time. This command generates a boatload of output, and at the bottom I get this:

Input: Bob brought the pizza to Alice .


brought VBD ROOT

+– Bob NNP nsubj

+– pizza NN dobj

|   +– the DT det

+– to IN prep

|   +– Alice NNP pobj

+– . . punct

SyntaxNet is trained with the Penn Treebank, so the part-of-speech tags look familiar, and the tree shows us what parts of the sentence are related. Let’s compare that to Salience, because a comparison of this new approach is an important part of this whole exercise.

In the Salience Demo application, I enter the same content and select to view “Chunk Tagged Text”. This is going to provide me with Salience output of the part-of-speech tags as well as similar grouping of related words called “chunks.” Salience produces the following output:

[Bob_NNP brought_VBN] [the_DT pizza_NN] [to_TO Alice_NNP] [._.]

The majority of the part-of-speech tags match, and the groupings look roughly similar, though the notable piece missing from Salience output is the dependency tree. Salience actually implements a light-weight dependency parse internally, and utilizes it for higher level functions such as sentiment analysis and named entity extraction. But for the purposes of performance, it’s a much simpler approach. And that’s another perspective that needs to be considered. After initialization, the output from Parsey shows the following:

INFO:tensorflow:Processed 1 documents

INFO:tensorflow:Total processed documents: 1

INFO:tensorflow:num correct tokens: 0

INFO:tensorflow:total tokens: 7

INFO:tensorflow:Seconds elapsed in evaluation: 0.12, eval metric: 0.00%

INFO:tensorflow:Processed 1 documents

INFO:tensorflow:Total processed documents: 1

INFO:tensorflow:num correct tokens: 1

INFO:tensorflow:total tokens: 6

INFO:tensorflow:Seconds elapsed in evaluation: 0.43, eval metric: 16.67%

INFO:tensorflow:Read 1 documents

This tells us that overall, evaluation of this sample sentence took 0.55 seconds. By comparison, Salience performs “text preparation” (tokenization, part-of-speech tagging, and chunking) and generates POS-tagged output on the same sentence in 0.059 seconds.

Let’s look at another example, something with a bit more meat on the bones. What happens if we process the text of the Gettysburg address? Parsey splits the content into three “documents,” and parses the 271 words in 1.86 seconds. Salience processes the same content in 0.093 seconds.

At the end of the day however, parsing sample sentences and the Gettysburg address is all nice and academic. But what we really want to do is use this technology for some practical purpose. Where text analytics has taken hold over the last several years is in analyzing the “voice of the customer” and social media monitoring. If you thought plain English was hard to understand, try Twitter!

#MyOnePhoneCallGoesTo JAKE from @StateFarm because he’s #HereToHelp 🐈

Input: # MyOnePhoneCallGoesTo JAKE from @ StateFarm because he ‘s # HereToHelp



+– MyOnePhoneCallGoesTo NNP nn

|   +– # NN nn

+– from IN prep

|   +– StateFarm NNP pobj

|       +– @ NNP nn

+– HereToHelp NN dep

+– because IN mark

+– he PRP nsubj

+– ‘s VBZ cop

+– # NN nn

Performance-wise, Parsey generates this parse of the tweet in 0.32 seconds, but has misinterpreted some of the elements that are unique to Twitter such as hashtags and mentions. Salience generates this parse of the same content in 0.07 seconds:

[ #MyOnePhoneCallGoesTo_HASHTAG] [JAKE_NNP] [from_IN] [ @StateFarm_MENTION] [because_IN he_PRP] [‘s_VBZ] [ #HereToHelp_HASHTAG]

This is where the rubber really meets the road. A syntactic English parser is a core component in any text analytics system. However, there is a lot that needs to be built on top of it to take advantage of its capabilities for higher level functions such as entity extraction (including Twitter mentions and hashtags) and sentiment analysis. I came across another article online that draws the same conclusion. In this article, Matthew Honnibal gives a great analogy that a syntactic parser is a drill bit. So a better parser is a better drill bit, but by itself, does not give you any oil. Having a better drill bit can improve the overall system, but it is one piece that plays a specific role with other pieces depending on it and performing their function.

Categories: Special Interest, Technology