Think of everything you’ve ever written: your high school essays, your emails, your Facebook posts, your tweets. What do you think it says about you? There’s a lot of written data out there. We tend to treat a lot of it like background noise, constant content creation becoming the static hum of the age of the World Wide Web. We might want to rethink this view, because researchers are finding more and more that important data can be found in the most unexpected places.
In 1990, David Snowdon of the University of Minnesota started the still ongoing “Nun study”, to examine the onset of Alzheimer’s . The study earned its name because its participants were a homogenous population of nuns at the School Sisters of Notre Dame. 678 women from the congregation agreed to have their brains examined after they died. During the study, Snowdon discovered a collection of essays written by the nuns when they applied for the congregation.
Snowdon found that the “linguistic density” of the essays, written by the nuns at a mean age of only 22, could determine with 80-90% accuracy whether the nun’s brains showed signs consistent with Alzheimer’s later in life.
Linguistic density is nothing more than the number of ideas in a sentence divided by number of words in that sentence. For example:
The fat cat has grey fur.
The above statement contains 3 ideas:
- The cat is fat.
- The cat has fur.
- The cat is grey.
3(Ideas)/6(Number of words)= 0.5
So “The cat has grey fur.” is a sentence with a linguistic density of 0.5
Nuns whose essays showed low linguistic density were highly likely to suffer from Alzheimer’s later in life, while those whose essays showed high linguistic density were very unlikely to suffer from it. In short, the complexity of language used as a young adult was a significant predictor of the onset of neurodegenerative disease in old age.
Researchers at the University of Toronto decided to put this, and other studies, to the test. They chose the perfect candidate: Agatha Christie. Not only a prolific author who continued to write and publish books throughout her adult life, Christie was also rumored to have become erratic and withdrawn in her later years.
They found that as Christie aged, her writing changed dramatically. The size of her vocabulary declined, she became more repetitive, and she was far more likely to use vague words like “thing” or “something”.The most striking example of Christie’s waning ability is “Elephants Can Remember”, in which the protagonist cannot help solve a crime because her memory is fading.
Of course, the study cannot actually prove that Agatha Christie suffered from a neurodegenerative disease, but the evidence is compelling. We know that, as she aged, Christie’s writing changed. We also know that these changes align with known correlational indicators for Alzheimer's. Although she was never diagnosed, Christie did become reclusive in her later years. It’s believed that her friends and family were protecting her from the scrutiny of the media.
The secret beauty of text analysis is finding those unexpected insights. It’s happening all the time as data scientists plunge into new areas, like the health industry. Think of everything you’ve ever written. Think about what you thought it could say about you. Now think again.