A Portrait of the Election in Data

  8 m, 52 s

The results of the Iowa Caucus and New Hampshire Primary traditionally mark the beginning of the Presidential Race, and so the most important contest of 2016 begins. However, as polling response rates drop ever lower more questions about the utility of phone polling continue to raise. Because of this, social media, especially Twitter, is taking more of a center stage in gauging the public’s perception of the election. Whether it will prove to be accurate remains to be seen, but the results are interesting nonetheless. So, in good patriotic spirit, we at Lexalytics decided to do our part and mine the sentiment from thousands of Tweets created by residents in Iowa and New Hampshire. Doing so should enrich the conversation about these historic events and help paint a portrait of the Nation’s evolving dialogue.

In order to do this, we collected only posts made from accounts using Twitter’s geotag feature, ensuring location authenticity to the best of our ability. This restriction led to a smaller set of data than we typically like to work with, which lead to some small and interesting problems. But, the output still revealed volumes about how people online were talking about the candidates. In total, we analyzed 4,500 Tweets, or about 1 in every ten sent out from Iowa and New Hampshire during the set time. This is large enough to be a representative.

Iowa Mentions 

The number of Tweets mentioning each candidate.

The number of Tweets sent from Iowan Twitter users mentioning each candidate

In Iowa, Republican caucus winner Texas Senator Ted Cruz got the most mentions of any other candidate with 867 Tweets mentioning his name, while Donald Drumpf (the second place finisher) was a distant third with 781 mentions. On the Democratic side, second place finisher Vermont Senator Bernie Sanders got the second most overall mentions from Iowan Tweeters at 863. This was a 105 more Tweets than former Secretary of State Hillary Clinton, who won the state by just .3%, leading to a virtual tie.

As we dug into the sentiment of these mentions, more interesting things began to reveal themselves. For example, while he topped the list of Twitter mentions, Sen. Cruz’s name was Tweeted about negatively far more than he was positively, as was Bernie Sanders. In fact the only candidates in the top-five (of both parties) who were mentioned positively were Sec. Clinton and third-place “winner” Senator Marco Rubio of Florida. Though, Sen. Rubio was mentioned less, with 300 tweets separating him from the other top-five candidates. This indicates an inconsistent online presence for the youthful Junior Senator.

New Hampshire Mentions 

The number of Tweets sent from New Hampshirite Twitter users mentioning each candidate

The number of Tweets sent from New Hampshirite Twitter users mentioning each candidate

However, the volume of Tweets tipped in Sen. Rubio’s favor in the Granite State, but it only showed that size isn’t everything, much to the Senator’s chagrin. In this data set Donald Drumpf and Marco Rubio were the most-mentioned candidates, with a 200 Tweet difference between them. Meanwhile, Ted Cruz and Bernie Sanders rounded out the top five. It was the same candidates taking the top mentions in Iowa, but in vastly different positions. Also like Iowa, while most mentions were neutral, the negative mentions outweighed the positive, but to a much higher degree.

When you look at the broader conversation it’s hard not to notice the fervent evangelical emphasis dominating conservative dialogue. However, it rarely comes up on the Twittersphere. Considering that Senator Bernie Sanders is the first ever non-Christian to win a primary, as he did in New Hampshire, this struck us as surprising. It seems as though the users who populate our dataset could care less about faith. This is either a commentary on the Twitter community, a largely irreligious constituency (which seems unlikely) or the specific data we collected.

More telling, however, is the number of mentions “lower-tier” GOP candidates, like the second-place finisher and Governor of Ohio John Kasich received. The same was the case for Jeb Bush and New Jersey Governor Chris Christie, who has since suspended his campaign. In the Iowa results, none of these three even broke 100 tweets. Yet, they were clearly able to get more people to talk about them in New Hampshire. Perhaps this display illustrates the shifting narrative in the Republican Party, where voters are beginning to focus on the quieter candidates, like Kasich, as the bluster from the pre-election debates begins to settle. As for the Democrats, Twitter reiterates what we all know, which is that Secretary Clinton still cannot penetrate Bernie’s historic grassroots internet campaign.

Word Clouds

A word cloud representing the volume and sentiment of mentions for candidates from Iowan Twitter users

Word cloud representing the volume and sentiment of tweets mentioning candidates from Iowan Twitter Users

Word cloud representing the volume and sentiment of tweets mentioning candidates from New Hampshirite Twitter Users

Word cloud representing the volume and sentiment of tweets mentioning candidates from New Hampshirite Twitter Users

As you can see in the above word clouds (Iowa on the top and New Hampshire on the bottom), the feelings about the candidates are starting to crystalize, at least as far as Salience is concerned. In Iowa, the candidates were mentioned in mostly neutral ways and the positive/negative mentions canceled each other out. However in New Hampshire, the victors of both contests had scores more negative mentions than positive. The few candidates who did have majority positive mentions, Rand Paul, Rick Santorum, and Mike Huckabee, had all ended their bids for the White House, and the Tweets largely congratulated them for running strong campaigns or expressed excitement for their exit.

As you continue to compare the two word clouds another difference begins to emerge. Bernie Sanders becomes ever more the darling of the internet, a narrative that has come to define his campaign. He tied with Secretary Clinton in Iowa, where both received largely neutral tweets. But the shift in his support as the campaign moved East is clearly reflected in the Twitter chatter coming out of New Hampshire during their primaries.


Themes are important noun phrases, they’re effectively the “buzz” and help give context. In looking at the themes, again Iowa on the top and New Hampshire on the bottom, we see a difference between the two states’ Twitter users. Iowans reacted negatively to the political horserace—such as the “voter shaming letters” the Ted Cruz campaign sent out—and positively to issues they saw as important, like climate change or the Second Amendment. Contrariwise in New Hampshire, the procedural aspects of the election– canvassing, town halls, and mentions of a campaign’s “ground game”—were all positive. New Hampshirites were loud and divisive on the issues they were concerned with. This weighted the sentiment of many of the phrases and themes coming out of New Hampshire negatively—New Hampshire is angry and wants change. This falls inline with the state’s historically individualistic and polarizing political message on both sides of the aisle.

New Hampshire

This word cloud represents the main themes and their sentiment found in Tweets sent by New Hampshirites during the Primary

This word cloud represents the main themes and their sentiment found in Tweets sent by New Hampshirites during the Primary

This statewide political passion explains the stealing votes theme, which takes center stage on the above word cloud as a massive concern in the New Hampshire Republican Party. This “Stealing Votes” business is from a series of tweets that claimed Sen. Cruz was “stealing votes” from Drumpf. This phrase also referenced a theory made popular on Breitbart. It posited that Microsoft (who provides the software used to tally votes and is also a Rubio contributor), intentionally committed voter fraud by reallocating votes to Rubio. This theory got its legs in the viral Twitter campaign #MicrosoftRubioFraud. The premise of the conspiracy was that Microsoft insiders were desperate “2 keep Rubio close.” These rampant theories illustrate the tumultuous ride the Republicans have been on during this nomination season.


This word cloud represents the main themes and their sentiment found in Tweets sent by Iowans during the Caucus

This word cloud represents the main themes and their sentiment found in Tweets sent by Iowans during the Caucus

We did have a technical fumble on our end. It speaks to the challenge of teaching language to a machine. As you can see in the Iowa word cloud above, “Top Secret” was apparently mentioned positively. The day before the caucus the State Department revealed that 22 of Hillary Clinton’s emails were retroactively classified. Embroiled in a scandal surrounding her use of a private email server while in the State Department, there is no way this should have been a “positive” mention.

It wasn’t.

The Mechanics

Part of what we are doing, for lack of a better description, is “training” Salience to understand and parse political jargon. It’s not just enough to understand the words themselves, but the context in which they are being used. And in politics, it’s all about context. The process of doing this is called tuning, and it’s how we produce preconfigured “packs” for various industries.

For example, if you ran a business that sold handmade furniture and your customers were tweeting about “a lot of returns,” that would be a bad thing. Yet, if you run a bed-and-breakfast and your customers were using the same language, it would be a good thing. When it comes to politics, “returns” mean poll results, so it wouldn’t have a negative or positive connotation—it would be neutral. That would be determined from the context, which is the text around the word.

So when it comes to “Top Secret” in Iowa or “Eminent Domain” in New Hampshire, this was a mistake because Salience read “top” as positive in the handful tweets we collected that cited that phrase. There were also just a couple of tweets about eminent domain and Donald Drumpf. But the user was sarcastic in saying they “love” it. Incidentally, teaching a machine sarcasm is notoriously difficult, so Salience mistook the sentiment and read that tweet as positive.

This error would simply wash-out if we had more tweets to draw from, as this is a common problem with respect to sarcasm or odd jargon. In focusing our search on just the posts that were confirmed to be coming out of the early-contest states, we limited the data we were able to use for our analysis. As the primary contests go on to larger states and multiple contests per day (e.g. “Super Tuesday”), this should correct itself.

In a sense, we know that there were far more tweets from New Hampshire or Iowa than we decided to analyze. The next contests in South Carolina and Nevada take place over two days in both states, which are more populous than either of the first two states. In fact, this speaks more to the problem of being able to determine Twitter users’ actual locations for analysis more than anything, but that’s a whole other story.

What do you think? Send us your thoughts, reactions, and insights to us via Twitter or Facebook.


Categories: Special Interest, Text Mining