I wrote this as a final paper in my last year at Fordham mostly as an excuse to play with NLTK. It was the first time I'd really done much with it.
John Milton’s classic Paradise Lost and the book of genesis both display interesting linguistic patterns when parsed using modern textual analysis tools. The tool I have concentrated on to narrow the scope of this paper is the Natural Language Toolkit (NLTK) for the Python programming language and its “Similar” function.
The similar function is capable of parsing a large text and returning other words that appear in a similar context throughout the book. For example, if I were to enter in the following code on a text that was comprised entirely of onomatopoetic dial tones:
I would receive output that looks like this:
boop bip bop
This means that the words “Beep,” “boop,” “bip,” and “beep” appear in similar enough contexts for the computer to think there’s more than a seventy percent probability that they mean the same thing.
Obviously, this examination technique gets more interesting when it is applied to actual works, because there are some surprising results.
Before diving into Milton and the Book of Genesis, it makes sense to take a further example to illustrate the validity of the function and its results. Along with the standard NLTK library comes a repository of millions of chat room conversations recorded in 2005. To anyone who’s familiar with the “filler” quality that words like “lol” have in adolescent conversation, the result below will demonstrate the validity of this function—each of the words below are used in chat rooms as “empty” pieces of speech:
Hi Hey and hiya lmao ty well yeah hello oh ok all haha no what yes you
Now, to the texts themselves. Let’s focus on one of the key events of both stories…the section focusing on Eve’s appetite for fruit.
The “similar” function affords us an interesting way to make a journey through a text. If we begin by asking the algorithm to find words that are used similarly to “eat,” we receive:
Be fear height in on place taste thee to accaron ades air Algiers all ambition apathy arioch as ascalon asp
Removing place names (each of which only occur once, which confuses the algorithm) from the results, we can see that the word “eat” is used similarly to “ambition”, “fear”, and “apathy” each of the twenty-two times that it occurs in Paradise Lost.
This, taken in contrast with the same interaction in the Book of Genesis, creates a very cool result. There, we get the following return from the function:
Be drink abram bear beast day die haran him Isaac Jacob shur simeon thee you abidah abimael accad accept
So, removing the proper nouns and concentrating on the most interesting words, we get “drink”, “die”, and “accept” for results. That these are all identical parts of speech probably has more to do with the fact that this algorithm was designed using the King James Bible as a reference text than anything else, but the nature of the words when contrasted with Paradise Lost is very interesting:
Now, this can be seen in a way as the beginning of a clear disjunction between John Milton and the author of the Bible. Perhaps this disjunction can be simply explained by Milton’s having had time to reflect on the stories and is explicitly attempting to imbue the characters of Adam and Eve with literary qualities, which may have not been the ambition of the Bible Author.
We can begin to trace these disjunctions of opinion further by looking for other significant departures between these two authors. This time around, we will go straight to the heart of the matter and examine the word “sin.”
When we summarize the meaning of a word in our language, we commonly do it through the use of synonyms and antonyms. This is true of simple word constructions (“eat” is like “gorge” with a difference of degree) but we also do it commonly with more complex ideas. It is commonplace for entrepreneurs to describe their ideas as (existing company) for (new market). Such comparators are often lampooned, but there is genuine information value in saying something like “It’s just like Facebook for pandas” or “it’s a distributed social network for presentation tools.”
The thematic information that we retrieve from the lack of a consistent identity for Sin within the corpus of the Book of Genesis is that there is not an articulated position about it. Interestingly, every time that Anger—its closest synonym—shows up in the Book of Genesis it makes reference to the anger of others, so to the Bible author anger was a possession of other people that is synonymous with sin.
Perhaps because of Milton’s approach—creating a narrative out of a preexisting story—there is a more “meaty” association with sin than just the related concept of anger in Paradise Lost proper…and the concept that he has chosen—or at least that the software has interpreted him as having chosen—is the concept of “strength.”
This may very well be a case for questioning the validity of NLTK—it’s an incredible (meaning: not immediately credible) result—but it bears further examination.
Part of the reason for this must be that strength and sin are both used close to half of the time near the end of a clause in writing. About seventy percent of the time that the word “strength” appears in Paradise Lost, it’s next to a piece of punctuation. The same is close to true for Sin, which appears next to punctuation about half of the time.
There’s a more literary explanation though. If we consider the nature of what it means to disobey the word of an all-powerful creator, it’s not insane to assume that there must be some feature of strength in the action. One of the very interesting passages in paradise lost (with text line markers unfortunately lost in its tokenized state) is the phrase “With strength from truth divided,” which does actually give some support to the notion that maybe there’s a concordance between sin and strength in this text.
If you liked this post, you should subscribe to my newsletter.