Categories

Computer Science Personal Economics General Mathematics Linguistics Questions Teaching Physics Talks History Theology

Archive

Clustering Shakespeare.

I ran my clustering program (which I just ran on the New Testament) on Shakespeare’s plays—which were conveniently packaged into a text file by Open Source Shakespeare.

The result was the following graph:

Clustering of Shakespeare’s Plays

I know little about Shakespeare, so I can’t say too much about the above image. I’d love to know what you think: does this arrangement of his plays make any sense?

Given that modern processors are so good at vector and matrix calculations, I’m surprised that this sort of visualization tool doesn’t appear in more places. For instance,

  • Your blogs and email could be organized this way. Imagine lasso-ing a bunch of similar emails to reply to them all at once!
  • News could be organized into nice piles.
  • Your desktop and personal files could be arranged automatically into relevant piles.

Then again, maybe the idea of piles appeals to me more than most people—just look at how I organize the papers and books on my desk!

Clustering the New Testament.

During Bible study last week, it was mentioned that people have used statistics to “determine” authorship of books of the Bible. Having a couple free hours last night, I tried my own experiment on the New Testament.

The procedure was easy: I downloaded the Nestle-Aland 26th edition of the New Testament; each book in the New Testament became a vector $v$, with $v_w$ counting the number of times word $w$ appears in the book. The cosine of the angle between two such vectors measured how similar the corresponding books are. I packaged these cosines into a matrix, the $(i,j)$ entry of which measured how similar books $i$ and $j$ are.

Of course, this is a $27 \times 27$ matrix. To turn these numbers into a nice picture, I projected the books onto a lower dimensional space spanned by the eigenvectors having the five largest eigenvalues (this is known as Principal Component Analysis); I chose five dimensions, displayed using location (two dimensions) and color (three dimensions, namely hue, saturation, and luminosity). The result is the following graph:

New Testament Clustering

The dots represent each book, and nearby dots of similar colors represent similar books. Some things jump out right away:

  • The Gospels are all in the lower right hand corner.
  • Paul’s epistles (and Peter’s?) are mostly in the upper right hand corner.
  • Revelation is close to John.
  • Hebrews and James are close to each other? Why?

All told, I think this is a pretty good graphical display of the structure of the New Testament, especially considering we used nothing but the Greek text and linear algebra!

NPR and wedding dresses.

While we (meaning my wife and I) were filling out the forms for our marriage license, we were interviewed by NPR for Morning Edition! A copy of the broadcast is available online.

National Bingo Night.

National Bingo Night (which seems to me to be very silly, but ignoring that…) has a “play along at home” game, where you print out a bingo card.

How would I design this? I had hoped that the website generated a Bingo card, digitally signed it, and then sent the signed card to the user. If it had been designed that way, ABC wouldn’t even need to remember which cards had been generated, as long as their private key wasn’t compromised.

How many Bingo cards are there? The first two and last two columns of a Bingo card are a sequence of 5 numbers drawn from 15 possible numbers, and the middle column has a “free” square, so it consists of only 4 numbers from 15 possible numbers. Anyway, this is $(15 \cdots 11)^4 \cdot (15 \cdots 12)$, which is a big number. In base 36, it is 18 digits long.

But the base 36 number below the National Bingo Night cards is only 10 digits long. Thus, this 10 digit number can’t encode the whole Bingo card–there are too many cards.

The official rules make this totally clear:

“ABC’s National Bingo Night” (the “Show”) Home Viewer Sweepstakes (June 2007) (the “Sweepstakes”) is a seeded instant win game. Unlike bingo, where selected numbers are drawn live before an audience of players who have purchased or otherwise obtained randomized cards, in this Sweepstakes the relevant numbers are known to Sponsor ahead of time, due to the nature of the recording schedule of the Show. Based upon the numbers drawn during prior in-studio tapings of the Show, Sponsor then randomly distributes a specific, predetermined number of potentially winning Sweepstakes “Game Cards.”

So they already know who will win, because the numbers have already been drawn. The serial number only has to encode “Winner” and “Loser.” Not so interesting.

I wonder, though, if there are other web games that actually use digitally signed objects for fun purposes?

Istanbul, not Constantinople, as a cover, in two senses.

I am frequently amazed to discover that songs which I had believed to have been original are actually covers. It turns out, for instance, that TMBG’s “Istanbul (not Constantinople)” is a cover of a song from the 1950s.

Ironically, one might argue that Istanbul is itself a cover of Constantinople–and that argument (unifying form and content) reminds me of the language games played by Salt: Grain of Life, a book asserting that its very structure resembles the culinary crystal it purports to discuss.