Categories

Computer Science Personal Economics General Mathematics Linguistics Questions Teaching Physics Talks History Theology

Archive

Translating individual words.

Given a text in two languages, is it possible to uncover the meaning of individual words?

The Bible is a particularly easy text to work with, since corresponding sentences are marked (i.e., with the same chapter and verse numbers). I downloaded a copy of the Hebrew Bible and the King James’ Version, and looked at Deuteronomy 6:4.

For each word in Hebrew, I found all the other verses with that word, and gathered together all the corresponding English verses; by picking the most popular word from those English verses (ignoring “the” and “and” and such), I found a pretty good translation of the original Hebrew word. In short, I picked the most popular English word in all those verses containing the non-English word.

So here’s Deuteronomy 6:4, with the top six English words underneath each Hebrew word:

אֶחָֽד יְהוָ֥ה אֱלֹהֵ֖ינוּ יְהוָ֥ה יִשְׂרָאֵ֑ל שְׁמַ֖ע
one
king
for
side
all
with
Lord
God
thy
for
thou
thee
our
God
Lord
for
which
not
Lord
God
thy
for
thou
thee
Israel
Lord
children
all
his
for
not
Lord
will
heard
them
voice

Remember to read this from left-to-right. Pretty impressive–it didn’t quite get the verb שְׁמַ֖ע but it did well enough anyway.

It also works in Greek. Here’s Galatians 3:26 with the most popular English words underneath each Greek word.

πάντες γὰρ υἱοὶ θεοῦ ἐστε διὰ τῆς πίστεως ἐν χριστῶ ἰησοῦ.
all
that
they
him
for
are
for
that
not
him
but
unto
children
shall
are
them
your
they
God
that
for
unto
not
but
are
you
for
that
not
shall
for
that
not
unto
God
which
that
for
unto
his
which
was
faith
that
for
God
but
Christ
that
unto
for
him
not
which
Christ
Jesus
are
that
which
God
Jesus
unto
that
him
Christ
said

It didn’t quite figure out διὰ is by or through.

In the end, this isn’t shocking, but it’s surprising how easy it is: the Ruby program to do this is only 150 lines long (which includes the code to print out those nice HTML tables with Unicode).

No cookie for me.

Today, I was about to sit down and read a paper (in French–I may not speak in tongues, but apparently I can read in tongues, so to speak!), and I thought to myself about how nice it would be to have a cookie. I went to Uncle Joe’s, I went to the Classics Cafe, I went to Cobb’s coffee shop, and then I gave up, for there were no cookies in any of those places, places which so often appear to be the source of cookies.

Is there someplace else on campus that I should have looked? Admittedly, I probably would’ve settled for the biscotti in the divinity school cafe (especially with some coffee).

Genesis clusters around the Akedah.

Someone contacted me with some questions about Bayesian document clustering; with that inspiration and a free afternoon a few weeks ago, I took a Hebrew bible and built a matrix $(A_{ij})$ where $A_{ij}$ equals the frequency of the $i$-th (Hebrew!) word in the $j$-th chapter of Genesis. I calculated its singular value decomposition (supposedly this is “latent semantic analysis”), and then took some dot products (calculating the “correlation” of chapters)…

Anyhow, the result was astounding! The following table gives, for each chapter, a list of those chapters for which the given chapter is the chapter most highly correlated with it. Ah, that’s confusing; as an example to clarify this, the chapter most similar to chapters six, seven, eight, and nine is chapter one. With that, here’s the data:

Chapter 1:2, 6-9
Chapter 5:11
Chapter 7:1
Chapter 10:12-15, 34, 36, 46, 49
Chapter 11:5
Chapter 15:16
Chapter 21:3, 22
Chapter 22:4, 17-33, 35, 38, 44
Chapter 36:10
Chapter 37:43
Chapter 40:41, 45, 47, 50
Chapter 41:39
Chapter 45:37, 42, 48
Chapter 50:40

The shocking thing is that for 21 chapters of Genesis–for nearly half the book–the most highly correlated chapter is chapter 22–the binding of Isaac. In my mind, that story is the most powerful in Genesis, central to the message, and so it is especially remarkable that this crazy game with matrices also “detected” that most of Genesis clusters around that story.

Sharpness of the Hurwitz 84(g-1) theorem.

There are usually courses at Mathcamp about surfaces; there should be courses about orbifolds! For instance, knowing that the smallest hyperbolic orbifold is the (2,3,7)-orbifold, having orbifold Euler characteristic $-1/84$, immediately gives that a closed hyperbolic surface of genus $g$ has no more than $84(g-1)$ isometries (preserving orientation); this is “Hurwitz’ $84(g-1)$ theorem.”

Just to show off this theorem, here is a cubical complex which is a surface with lots of symmetries (and the clever reader will recognize this as coming right out of Davis’ construction of aspherical manifolds): consider the $n$-dimensional cube $I^n$, and let $e_1, \ldots, e_n$ be the $n$ edges around the origin, and $e_i e_j$ be the square face containing the edges $e_i$ and $e_j$. Define a subcomplex $\Sigma^2_n \subset I^n$ consisting of the squares $e_1 e_2, e_2 e_3, \ldots, e_{n-1} e_n, e_n e_1$ and all squares in $I^n$ parallel to these. Now $\Sigma^2_n$ is a orientable surface (the link of any vertex is an $n$-cycle, i.e., topologically an $S^1$). Don’t be fooled by the notation: $\Sigma^2_n$ has genus much larger than $n$.

In fact, let’s calculate the genus. Every vertex of $I^n$ is contained in $\Sigma^2_n$, and there are $2^n$ vertices in $I^n$. Likewise, every edge in $I^n$ is contained in $\Sigma^2_n$, and there are $n 2^{n-1}$ edges in $I^n$. Finally, there are $2^{n-2}$ squares parallel to each of $e_i e_j$, so there are $n 2^{n-2}$ square faces in $\Sigma^2_n$. Thus, $$\chi(\Sigma^2_n) = 2^n - n 2^{n-1} + n 2^{n-2} = 2^{n} - n \cdot 2^{n-2},$$ and so $\Sigma^2_n$ is a surface of genus $g = 1 + 2^{n-3} (n - 4)$. This is a maybe a good exercise for someone first learning about Euler characteristic, but not especially interesting…

So here’s the punchline–or rather the punch-question–why is the genus growing exponentially in $n$? Because $\Sigma^2_n$ is very symmetric! And Hurwitz says to get so much symmetry, we need (linearly) as much genus. And we can find exponentially many symmetries of $\Sigma^2_n$ without any work. For starters, the group $(\Z/2\Z)^n$ acts on $\Sigma^2_n$ by reflecting through hyperplanes, as does the group $\Z/n\Z$ cyclically permuting the basis $e_1, \ldots, e_n$. If we want to be precise, let $G_n$ be the resulting group of order $n 2^{n}$. Quotienting $\Sigma^2_n$ by these symmetries gives an orbifold $\Sigma^2_n / G_n$, which one observes to be a square with cone points on each vertex (three with cone angle $\pi/2$ and one with cone angle $2 \pi/n$) and reflections in each of the four sides. Thus, the orbifold Euler characteristic of $\Sigma^2_n / G_n$ is $3/4 + 1/n - 4/2 + 1 = 1/n - 1/4$, so the Euler characteristic of $\Sigma^2_n$ must be $(1/n - 1/4) \cdot n \cdot 2^{n} = 2^{n} - n \cdot 2^{n-2}$, just like we got before. One might argue that this method was “easier” than the previous method for calculating $\chi(\Sigma^2_n)$, but that misses the point—I (and probably everyone else) calculated the number of edges of $I^n$ by using a group action, if only implicitly.

The point is, even without doing any calculations or thinking very hard, the number of symmetries of $\Sigma^2_n$ is growing exponentially in $n$, and therefore the genus must be growing exponentially in $n$ as well—the orbifold makes this reasoning precise.

There’s a lot of stuff left to be discovered about the number of automorphisms of genus $g$ surfaces. For instance, it’s known that the $84(g-1)$ bound is attained for infinitely many genera, but there are also infinitely many genera for which it is not attained. Let $N(g)$ be the maximal order of the automorphism group of a genus $g$ surface; Maclachlan and Accola proved (in 1968) that $N(g) \geq 8(g+1)$. This bound is sharp, too. There’s a beautiful paper

Belolipetsky, Mikhail and Jones, Gareth A.. A bound for the number of automorphisms of an arithmetic Riemann surface. Math. Proc. Cambridge Philos. Soc. 2005. 289–299. MR.

working out what happens in the arithmetic and non-arithmetic case. Anyway, what is known about the set of $g$ for which $84(g-1)$ is attained? What is the asymptotic density of this set?

Classifying manifolds is impossible.

At a recent Pizza Seminar, Matt Day gave a lovely talk explaining why it isn’t possible to classify 4-manifolds.

An algorithm for deciding whether two closed 4-manifolds are homeomorphic gives an algorithm for deciding whether a closed 4-manifold is simply connected, and therefore (since every finitely presented group is the fundamental group of a 4-manifold), and algorithm for deciding when a group is trivial. Here’s the reduction: we are given a 4-manifold $M$, and we compute its signature $\sigma(M)$. By Freedman, there are no more than two closed simply connected 4-manifolds, $M_1$ and $M_2$, having the same signature as $M$; we construct $M_1$ and $M_2$, and we use the homeomorphism decision procedure to test if $M \cong M_1$ or $M \cong M_2$.

Since there is no algorithm for deciding when a group is trivial, there can not be an algorithm for deciding when two closed 4-manifolds are homeomorphic.

Here is a paper discussing some of these issues:

Chernavsky, A. V. and Leksine, V. P.. Unrecognizability of manifolds. Ann. Pure Appl. Logic 2006. 325–335. MR.

In particular, that paper discusses Novikov’s proof that $S^n$ cannot be recognized for $n \geq 5$.