Categories

Computer Science Personal Economics General Mathematics Linguistics Questions Teaching Physics Talks History Theology

Archive

Granger causality and Intrade data.

Granger causality is a technique for determining whether one time series can be used to forecast another; since the Intrade market provides time series data for political questions, we can look at whether political outcomes can be used to forecast other political outcomes.

There’s a library for the statistical package R to do the Granger test, and Intrade produces CSV market data. I fed the market data for various contracts since January 1, 2008 into R, and the output of that into GraphViz to make a nice-looking visualization; in particular, I connect $a$ to $b$ if $a$ Granger-causes $b$ with $p$-value less than 0.05. Darker arrows have smaller $p$-values. This is all an embarassing misuse of statistics and $p$-values, but it is quick and easy to do, and the results are fun to see.

Here is the graph for a lag of one day (i.e., does yesterday’s value of $a$ predict today’s value of $b$):

One day lagged granger causality graph

Here is the graph for a lag of two days (i.e., can the two previous days of data for $a$ be used to forecast the next day of data for $b$):

Two day lagged intrade granger causality graph

And here is the graph for a lag of three days:

Three day lagged intrade granger causliaty graph

Don’t take this too seriously. And one word of warning: an arrow from $a$ to $b$ does not mean that if $a$ is more likely, then $b$ is more likely—rather, it ought to mean that past knowledge of $a$ can be used to forecast $b$. I suppose it would be interesting to add some color for the direction of the relationship, and maybe I’ll do that when I have another free hour.

Movies of some neat cubical complexes.

I made some movies of some of my favorite complexes: let $I^n$ be the $n$-dimensional cube, and let $e_1, \ldots, e_n$ be the $n$ edges around the origin, and let $e_i e_j$ be the square face containing the edges $e_i$ and $e_j$. Define a subcomplex $\Sigma^2_n \subset I^n$ consisting of the squares $$e_1 e_2, e_2 e_3, \ldots, e_{n-1} e_n, e_n e_1$$ and all the squares in $I^n$ parallel to these. It turns out that $\Sigma^2_n$ is a surface with a lot of symmetries.

In particular $\Sigma^2_4$ is a torus in $\R^4$, and here is a movie of it spinning:

I’m particularly fond of this, as you can really see that four squares are coming together at each vertex (hence, it has zero curvature), and you can see the hole in the torus as it spins.

The complex $\Sigma^2_5$ is a genus five surface in $\R^5$, and here is a movie of it spinning:

I represented the extra dimensions with color—not that it helps much!

Books that are useless on a desert island.

Drew Hevle raises a very interesting question: suppose you are stranded on a desert island; what books would be entirely useless in this situation?

Here are a few books that I wouldn’t want to be stranded on an island with:

Do you have other ideas for awful desert island reading?

Visualizing pineapple pancakes.

The pineapple sauce pancake graph has English words as vertices, and a directed edge from $a$ to $b$ if the concatenation $ab$ is also an English word. For instance, there is a vertex labeled pine, and a vertex labeled apple, and an edge from pine to apple.

Anyway, the graph is huge; and the usual visualization tool (Graphviz) doesn’t work particularly well on the whole graph, so I took a few hundred vertices around pine, apple, sauce, pan, and cake. The result was the following:

Small pineapple graph.

Clustering texts with an obvious grouping.

It was pointed out to me by Kenny Easwaran that I ought to try clustering texts that already have a natural grouping.

So I ran the clustering program on 15 texts written by three authors, and here is the result:

Clustering Jane Austen, Shakespeare, and Sir Arthur Conan Doyle.

The largest eigenvalue is 25 times bigger than the next largest eigenvalue, and picks out the author pretty well. The top pile consists of Jane Austen’s books (with Emma split into three volumes). The middle pile consists of Sir Arthur Conan Doyle’s books, with the Sherlock Holmes mysteries (Valley of Fear, Sign of Four, and Hound of the Baskervilles) grouped closer than the others. The bottom pile are five of Shakespeare’s plays.

Of course, these people are all pretty different. As requested below by Theo, let’s run it one more time, using 12 books from George Eliot, Jane Austen, and the Brontë sisters.

Three from the same period.

Well, that didn’t quite work. The books by the Brontë sisters (Wuthering Heights, Villette, The Professor, Jane Eyre) have been separated from the others, but George Eliot and Jane Austen are getting mixed together. Admittedly, if you just project to the y coordinate, the authors are sitting in disjoint intervals. Nevertheless, this isn’t as nice as I might hope; so let’s run it again, just on the eight books written by the two authors that aren’t being sufficiently separated:

Just two from the same period.

I suppose this is somewhat better, though it’s basically just a stretched out and inverted version of the previous image. Jane Austen’s books (Sense and Sensibility, Pride and Prejudice, Mansfield Park, Emma) are all up on top, and George Eliot’s books still aren’t piled together.

You might have guessed that I have Project Gutenberg to thank for the text files (including the Shakespeare plays).