There’s Been a Lot of Conversation Surrounding Caitlyn Jenner
Caitlyn Jenner made a big splash on Twitter, and we were more than happy to assess exactly what that meant. In a recent VentureBeat article by Barry Levine, we revealed a definitive assessment on the conversation surrounding Caitlyn Jenner. The VentureBeat article also touched on Caitlyn’s past and her several claims to fame, including breaking the Guinness World Record for the shortest time to acquire a million followers on Twitter.
As thrilled as we were to work with VentureBeat, we still wanted the chance to get into the nitty gritty of our analysis. Here I aim to breakdown exactly what it was that we discovered, and how.
We used Datasift’s historical archive to extract all publicly available tweets between 12pm, May 31 and 12pm, June 2 that mentioned the following:
This gave us a data set of around 970,000 tweets. About 2/3s of these tweets, 649,000, occurred on June 1st, the day Caitlyn’s Vanity Fair cover was revealed.
Next, we refined our data set. The the rest of Jenner family also incite a fair amount of conversation on social media, so it was important to strip out all the content that didn’t mention Caitlyn, Brust, #callmecaitlyn, or @caitlyn_jenner. This left us with about 735,000 tweets.
This left us with precisely 574,956 tweets to process – a huge sample that also represents a statistically significant sample of all relevant keywords.
As Seth stated in the VentureBeat article, the results were unusually and overwhelmingly positive. Generally a content is somewhere between 60-80% neutral, with a fairly even split between positive and negative. In the case of the conversation surrounding Caitlyn Jenner, there are four times as many positive tweets as negative ones.
The top themes present in the data set were equally positive. “true self” came in at the top, followed by “world record”, referring to Caitlyn’s record setting amassing of twitter followers on her recently unveiled twitter account, “absolutely stunning”, “much respect”, and “beautiful woman”.
We also broke down the analysis by time zone and were pleasantly surprised to see that there was not a single time zone in which the amount of negative tweets outweighed the number of positive ones.
DataSift divides twitter users into male and female by gendering their names. Users with commonly held by men, such as Brian, are classified as male, while users with names commonly held by women are classified as female.
This approach, while effective enough to reveal gaps in sentiment between male-classified and female-classified users, has a few drawbacks. Not every person on Twitter has a name that can be classified, leading to a group of twitter users classified under “Gender Not Available”. Names that are ambiguous or cannot be classified as male or female, such as “Pat”, are placed in a separate Unisex category.
Timezone data is also taken from twitter user’s pages. Their accuracy cannot be 100% confirmed, as it is possible to list your geographic location as wherever you wish. So a user from Colorado might list their location as Paris, and be included in the analysis of that timezone.
An Overwhelmingly Positive Conclusion
Ultimately, however, the huge size of our data set, half a million tweets, and our ability to compare these results with other previous Twitter analysis do leave us with some very compelling evidence. The overwhelmingly positive sentiment in the conversation surrounding Caitlyn Jenner’s big reveal is significantly higher than we would have expected. Themes were dominated by positive phrases, and there was not a single timezone in which tweets categorized as containing negative sentiment outweighed those containing positive sentiment.
Congratulations Caitlyn, I hope your future is just as positive.
Want More? Click Our Infographic To Download!