We Used AI On 6,000 Game of Thrones Predictions

  16 m, 23 s

According to HBO, Game of Thrones premiered its eighth season to 17.4 million viewers. Fans across the internet subsequently descended into an Aerys Targaryen-level of madness; people from all over the world are posting, blogging, tweeting, and streaming prediction after prediction and critique after critique.

So we decided to take 6,000 of these predictions and feed them to an AI called Salience. Then we visualized the results in Semantria

“Without giving your customers the chance to tell you in their own words, you’ll never have the data you need to make informed, effective decisions.”

— Noah Blier | Lexalytics

HBO and other production companies use benchmarks like Net Promoter Scores (known simply as ‘NPS’) and Nielsen ratings to measure fan reaction to television. While useful, these standards may miss some crucial parts of the conversation. Nonetheless, film and TV executives make production, marketing, and even creative decisions based on these benchmarking standards.

That’s where natural language processing comes in.

How we analyzed 6,000 Game of Thrones predictions

One of the most robust feedback systems for Game of Thrones executives is fan forums. The television show is built on shocking twists, audience subversion, and mystery. The resulting buzz takes social media by storm. Long form social platforms, like Reddit, allow fans to discuss the nuance of this buzz at length.

For this reason, we chose to analyze a high volume Reddit prediction thread on /r/GameOfThrones. While Lexalytics offers six strategic industry configuration packs, we do not as yet offer an out-of-the-box Westerosi configuration (¯\_(ツ)_/¯). Lucky for us, creating and tuning a new Semantria configuration is straightforward and coding-free. So we got to work spinning up a Game of Thrones configuration from scratch.

In this article we’ll analyze the comments and theories of one popular thread on /r/GameOfThrones where Redditors predict the outcome of Season 8. First we’ll analyze the content of the thread up to the premiere of the season; then we’ll return to analyze the thread up to the penultimate episode of the series. The goal will be to discover how fans feel about the show overall and to structure trending theories.

First we’ll pull the curtain back to breakdown how an analysis like this is managed. We’ll walk you through building a custom configuration for a media property like Game of Thrones. We’ll illustrate how a data scientist or production company might use that configuration to distill the consensus of fan discussion.

If you’d like to skip straight to the analysis, click here.

Content

Finding and preparing data

As a data analyst, the first step when running an analysis is to find the most impactful data. As we discussed earlier, Reddit is a hotbed for TV conversation and a fantastic source for natural language data. To build and tune this configuration, we collected over 6,200 comments from a popular predictions thread on /r/GameOfThrones. Scraping text data might sound like an intimidating task. However, tools like WebHarvy make it easy and affordable.

Before we can think about building a configuration, we need to clean our dataset. To do this, we export the scraped data into Microsoft Excel. For our analysis, we only need two columns: ID and Text. ID signifies a row in the data set; Text is the natural language document (aka a Reddit post).

Building a custom configuration to process our data

A configuration is a set of rules for a machine to follow. Natural language processing uses text analytics to mine data sets for meaning. A configuration is how a human defines simple parameters. For example, a configuration might be used to define specific concepts that a machine otherwise won’t know to look for.An illustration of Semantria Configurations

Configuration building is where you create custom entities, queries and taxonomies; you can also fine tune the sentiment to accommodate case-specific context. For our project, we need to tackle the expansive list of entities specific to Game of Thrones. Lucky for us, Game of Thrones is enriched by myriad fan resources, like A Wiki of Ice and Fire.

Feel free to check out the Ultimate NLP Tuning Guide if you want to learn more about configuration building.

Extracting entities

The Game of Thrones custom entity list

A look at a custom entity list in the Lexalytics configuration tab

Game of Thrones is known for its expansive cast. Many Westorosi citizens have nicknames and titles beyond their primary character name. Further, there are many common misspellings of each name throughout the data set. These need be accounted for. This means, one-by-one, each character must be added and cataloged with all their aliases. Entity management is crucial because, without proper labeling, sentiment may be assigned to the wrong character or entity.

For our case, this process is repeated for all of the characters, cities, and weaponry mentioned in the plot. After we’ve accounted for all entities, we test our accuracy. Running an analysis with the Semantria Excel Plugin is a useful way to identify the entities we might have missed. These entities can then be added so the process might be repeated again.

Building queries

Next we tackle queries. Queries address the relationship between ideas and entities within your dataset. Query building can seem like a daunting task, especially when building a configuration from scratch. One place to start is with Themes, which are automatically detected by the machine.

We added queries tackling the broader strokes of the show. In particular, we define Duty, Betrayal, Power, Death, and Family as queries. This gives us a wide enough net to contextualize fan theories and opinions.

Entities automatically extracted from the data set without a configuration

To illustrate the power of custom configurations (and to test our progress) we analyzed the data set using Lexalytics’ standard “Blank-English” configuration against our customized Game of Thrones configuration. As it turns out, our blank configuration (which uses machine learning to automatically detect entities) did pretty well!

Entities extracted using the custom Game of Thrones configuration

Nonetheless, we still see improvements in sentiment and entity normalization when we create some custom entity queries. Now our sentiment analysis delivers more impactful output. Semantria automatically scores the sentiment for every character, no customization needed there!

We’ve gathered and cleaned our data and built a custom Game of Thrones configuration. Now we can generate some interactive dashboards to visualize our Reddit thread. Let’s get to work on the analysis!

Who is going to die in Season 8?

Some of the biggest buzz this season is around death predictions. In this chart we’ve taken all the mentions of our top ten characters in the data set; then we’ve segmented all the predictions of those top ten characters dying. In all, the Night King, Jon Snow, and Cersei Lannister are the most frequently discussed deaths in the Reddit thread.

We’ll dive into the fan theories around the Night King soon (spoiler: they’re pretty accurate). But first let’s zoom in on another piece of that pie: Tyrion Lannister.

Tyrion is famous for being a crowd favorite. Says one fan on Reddit: “IF TYRION DIES IM NEVER WATCHING AGAIN.” Lucky for them, it’s the final season; a grim trend is uncovered when we analyze the phrases most frequently associated Tyrion.

Notice all the red in the word cloud? That denotes negative sentiment. We see where these ominous phrases come from when we dive into the underlying fan theories associated with this word cloud.

“I am pretty sure Tyrion is a traitor and will die for it. I think he will betray Dany to protect his sisters unborn baby. He made it clear he loves his family and would do anything for them.”

While others see Tyrion as the hero, they still see him meeting the same fate.

“Cersei to kill Jamie fits better to me. After Jamie betrayed her by going to the north I doubt she would ever trust him again. I think Tyrion will sacrifice himself for his family somehow.“

So, it looks like the hivemind is predicting that Tyrion will betray Daenerys to save his family. As a result, Dany will sentence him to death and Tyrion will end up getting on the wrong side of Drogon the Dragon.

Not unlike Bran’s greensight, mining text data gives us the outline necessary to make predictions. But the devil is always in the details, which is why drilling into individual comments helps analysts get the most out of the analysis.

Analyzing Night King predictions

The best part about analyzing internet theories is when the fans get it right on the money. As we examine the rest of our data set, we explore one of the most popular entities, the Night King.

When it comes to overall buzz, the Night King is second only to Jon Snow.

The Night King is unique in that his character is largely removed from the rest of the cast. He tends to exist on a narrative thread apart from the daily politics of Westeros. Nonetheless, he is a part of the playbill. We want to see which characters the Reddit prediction thread associates with our frozen friend from beyond the Wall. Using a process known as entity extraction we filter every Reddit post mentioning the Night King. Here are the characters that fans most associate with ol’ blue eyes.

If you’re anything like the Lexalytics marketing team, then you may be wondering why Cersei is above Arya on this chart. It turns out there was a non-trivial theory that Cersei would end up marrying with the Night King — herself becoming the “Night Queen.” Watch out Euron!

“The Night King claims a Night Queen, I believe this will be Cersei and they rule Westeros in perpetual Winter” says one pessimistic Redditor. This trend continues:

“Cersei gets killed by the Night King, becomes the Night Queen & Jamie Lannister is the one to have to end her.”

Some of these theories get quite dark, like this Redditor who posits a wild twist:

“I predict that Cersei will offer her child to the Night King in exchange for ruling the South, much like Caster did with his Daughter-wives.”

Nonetheless, Arya still makes the top five! It bears mentioning a few savvy viewers predict her role in the plot well before the season premier.

“All I know is that the NK will be killed with Littlefinger’s dagger that was used against Bran in S1. No way we have tracked that blade all these years, and it ends up with Arya in S7, for it not to be the pivotal weapon in the largest battle.”

It’s not long before the analysis stumbles onto surprisingly accurate theories of what will happen to the Night King.

“What are your thoughts on the catspaw blade being the secret (and only) weapon to kill the night king?”

Finally, calling their shot, one fan puts it as plain as can be.

“Arya kills The Night King with her Valyrian steel dagger.”

The dagger — which might’ve appeared innocuous to the casual viewer — is central to many Redditors’ pre-Season 8 theories. It is, in fact, referenced by various names throughout the data set.

To most of the viewing public, Arya’s Jordan-esque leap-and-dagger move made for edge-of-your-seat television. However, to the Redditors in the prediction thread it was just another “told you so” moment.

Who is going to die in Season 8? Mid-season check-in

For our second dataset, we collected documents from the same /r/GameOfThrones thread. This time, they date from the premiere of the season to the fourth episode. This collection of data will allow us to see the shift in theories and reception for the show midseason.

Let’s compare the midseason data set with the preseason dataset. Examining sentiment scores for our characters using the topic “Death” allows us to track the shift in sentiment surrounding their survival.

Death predictions by sentiment score pre-season versus mid-season

We see by comparison that the sentiment score for Queen Cersei plummets from -0.38 to -0.68. Her twin brother Jaime Lannister follows this trend, registering a sentiment score drop of -0.22. The dramatic shift in sentiment scores suggests that the fans don’t have high hopes for the brother and sister. Characters with lower sentiment scores on this chart are fan favorites to perish before the show ends.

By digging deeper we’re able to find out what’s impacting the Lannister twins’ fate.

“Maggy’s prediction about Cersei being killed by her younger brother will be true and Jaime will kill Cersei,” proclaims a confident Redditor.

Maggy’s prediction refers to Maggie the Frog and her vision of Cersei’s prophesied death from someone knows as the Valonqar.

This particular theory stems mostly from the books and not from the HBO series. Nonetheless, it’s rampant throughout our dataset. Here a Redditor puts forth the theory that Jamie is the valonqar.

“All I know is Jamie kills Cersei, he’s the younger brother that was prophesied to kill her. And completing his transformation into the Queenslayer.”

Another Redditor even goes a step further with an interesting twist:

“Dunno if people are still reading at this point and I might have missed it, but Cersei will be killed by Arya by pretending to be Jamie/Tyrion. It’s pretty much the whole of Arya’s buildup to kill the people in her list and its close to the valonqar prophecy.”

Even though the television show Game of Thrones is surpassing the content in the books, fans still point to the source material for predictions. No matter where the theories come from, it’s clear Reddit users have Cersei’s death as a top priority.

Betting on Cleganebowl

It’s time to talk about the most anticipated sports event in Westeros: Cleganebowl. For years fans of the books and TV show have been waiting for the final standoff between Sandor “The Hound” and Gregor “The Mountain” Clegane. The brothers have been pitted against each other since the very beginning of Game of Thrones. In the words of a perceptive Redditor:

“Cleganebowl is the only certainty.”

So, who’s the favorite to win?

At first glance the brothers appear to be in a dead heat.

This visualization filters mentions of the Clegane brothers and “death.” The green shading effectively represents the fans’ bets on survival. What this graph tells us is that Ser Sandor may be correct in his prediction that he will not return from the confrontation with his undead brother.

Interestingly, the AI identified a big trend around the phrase “fiery sword” or “flaming sword” when running the phrase analysis.

“The Hound kills the Mountain with a flaming sword, overcoming his fear.”

Sandor will use fire to kill the undead Mountain in Cleganebowl.”

“The Hound will have to get over his fear of fire to be able to use it to kill the Mountain somehow.”

As it turns out, mentions of both Sandor and Gregor are frequently collocated next to predictions of their respective deaths (hence the draw we observe in the chart above).

“I don’t see the Hound coming out of that fight.”

There’s a lot of talk about a mutually assured destruction.

“The Mountain and the Hound kill each other.”

The AI also highlights the arrival of an unexpected entity in the Cleganebowl conversation: Arya Stark.

“Arya will reunite with the Hound and [they’ll] kill the Mountain together. The Hound dies in the process.”

“The only one I can say I think will definitely happen is that the Mountain will win Cleganebowl and then Arya will kill the Mountain.”

Here’s what the analysis tells about Cleganebowl: The Hound will use Beric Dondarrion’s (RIP) flaming sword to attack his brother. After an epic fight, The Hound will misstep and be struck down (audible gasp!). Then, at the eleventh hour, Arya will put her Air Jordans back on and leap into the fray, killing the Mountain. The entire process should work to complete The Hound’s redemption arc.

Make bets at your own discretion.

So, who wins the Iron Throne?

Let’s take a look at the final top ten tally.

With 26% of the vote, the top Reddit predictions thread is calling it for Daenerys Stormborn. She edges out Jon Snow by a slim 2% margin. Reddit is bullish on Dany, despite her alliance’s massive losses, Euron’s surface-to-dragon missiles, and her ever maddening mindset.

On the other side of the chart is Gendry, Sansa, and Bran. This seems likely; Sansa’s ambitions lay in the North, Gendry has no appetite for politics, and Bran is a three-eyed raven. The only unexpected face on this chart is — that’s right — Hot Pie! Reddit thinks Hot Pie has a better chance at the throne than Bran, Sansa, or Gendry. Says one Hot Pie supporter:

“After the war is over, Hot Pie bakes a pie for whomever is sitting on the Iron Throne, but the pie is poisoned. He then takes the throne from them. All hail king Hot Pie!”

Would that it were.

Voice of Fan Analytics: what did we accomplish?

By investigating the Reddit prediction thread on /r/GameOfThrones using our custom Game of Thrones configuration, we’ve illustrated:

  • How to create an entity-specific configuration
  • How to use high-level visualization to explore fan sentiment
  • How to capture the fan conversation and use detailed data points to highlight theories

These are just some of the ways brands might use AI-powered text mining solutions to uncover trends and predict outcomes within text data sets.

The final two episodes of Game of Thrones figure to be some of the most watched television in recent memory. From the fans’ perspective, they have been watching, writing (and probably yelling) about Game of Thrones for the past ten years. Thanks to straightforward, coding-free technology we’re able to enjoy this spirited debate.

As our favorite characters march towards the series finale, we wish them “good fortune in the wars to come.”

Valar morghulis.

Categories: Lexalytics