digital text analysis | my nerves are bad to-night

Strata’s ideal resident is an altogether wealthier breed of pioneering urbanaut

March 21, 2014 § Leave a comment

strata's-ideal-210314

More than 400 years after Shakespeare wrote it, we can now say that “Romeo and Juliet” has the wrong name. Perhaps the play should be called “Juliet and Her Nurse,” which isn’t nearly as sexy, or “Romeo and Benvolio,” which has a whole different connotation.

I discovered this by writing a computer program to count how many lines each pair of characters in “Romeo and Juliet” spoke to each other, with the expectation that the lovers in the greatest love story of all time would speak more than any other pair. I wanted Romeo and Juliet to end up together — if they couldn’t in the play, at least they could in my analysis — but the math paid no heed to my desires. Juliet speaks more to her nurse than she does to Romeo; Romeo speaks more to Benvolio than he does to Juliet. Romeo gets a larger share of attention from his friends (Benvolio and Mercutio) and even his enemies (Tybalt) than he does from Juliet; Juliet gets a larger share of attention from her nurse and her mother than she does from Romeo. The two appear together in only five scenes out of 25. We all knew that this wasn’t a play predicated on deep interactions between the two protagonists, but still.

I’m blaming Romeo for this lack of communication. Juliet speaks 155 lines to him, and he speaks only 101 to her. His reticence toward Juliet is particularly inexcusable when you consider that Romeo spends more time talking than anyone else in the play. (He spends only one-sixth of his time in conversation with the supposed love of his life.) One might be tempted to blame this on the nature of the plot; of course the lovers have no chance to converse, kept apart as they are by the loathing of their families! But when I analyzed the script of a modern adaptation of “Romeo and Juliet” — “West Side Story” — I found that Tony and Maria interacted more in the script than did any other pair.

All this got me thinking: Do any of Shakespeare’s lovers actually, you know, talk to each other? If Romeo and Juliet don’t, what hope do the rest of them have? read more

PHOTOGRAPH: DHA

When I hear a director speaking glibly of serving the author, of letting a play speak for itself, my suspicions are aroused

January 24, 2013 § Leave a comment

when-i-240113

We perform digital analysis on literary texts not to answer questions, but to generate questions. The questions digital analysis can answer are generally not ‘interesting’ in a humanist sense: but the questions digital analysis provokes often are. And these questions have to be answered by ‘traditional’ literary methods…

If you look for absences of high-frequency items, you are using digital text analysis to do the things it does best compared to human reading: picking up absence, and analysing high-frequency items. Humans are good at spotting the presence of low frequency items, items that disrupt a pattern (outliers, in statistical terms) – but we are not good at noticing things that are not there (dogs that don’t bark in the night) and we are not good at seeing woods (we see trees, especially unusual trees).

The Hamlet results were pretty outstanding in this respect: very high up the list, with 3 stars, indicating very strong statistical significance, is a minus result for the pronoun ‘I’. A check across the figures shows that ‘I’ occurs in Hamlet about 184 times every 10,000 words (see the column headed ‘Analysis parts per 10,000′ – Hamlet is the ‘analysis text’ here), whereas in the rest of Shakespeare it occurs about 228 times every 10,000 words (see the column headed ‘Reference parts per 10,000) – the reference corpus is the rest of Shakespeare) – so every 10,000 words in Hamlet have about 40 fewer ‘I’ pronouns than we’d expect.

Or, to put it another way, Shakespeare normally uses ‘I’ 228 times every 10,000 words. Hamlet is about 30,000 words long, so we’d expect, all other things being equal, that Shakespeare would use ‘I’ 684 times. In fact, he uses it just 546 times – and Wordhoard checks the figures to see if we could expect this drop due to chance or normal variation. The three stars next to the log likelihood score for ‘I’ tell us that this figure is very unlikely to be due to chance – something is causing the drop.

Digital analysis can’t explain the cause of the drop: the only question it is answering here is, ‘How frequently does Shakespeare use “I” in Hamlet compared to his other plays?’. On its own, this is not a very interesting question. But the analysis provokes the much more interesting question, ‘Why does Shakespeare use “I” far less frequently in Hamlet than normal?’.

Given literary-critical claims that Hamlet marks the birth of the modern consciousness, it is surprising to find a drop in the frequency of first-person forms. read more

PHOTOGRAPH: Teresa Queirós

my nerves are bad to-night

Strata’s ideal resident is an altogether wealthier breed of pioneering urbanaut

When I hear a director speaking glibly of serving the author, of letting a play speak for itself, my suspicions are aroused

Where Am I?