The Muqata: Software Helping Identify "Biblical Writing Styles"

Sunday, July 03, 2011

Software Helping Identify "Biblical Writing Styles"

Anyone who has done any academic study of the Bible/Torah is familiar with the concept that there are multiple "writing styles" in the Torah.

The theory (Documentary Hypothesis) assigns the letters J, E, D, and P to the different "writing styles/authors" of the different parts of the 5 books of the Torah. (You can read all about that in books and on the web).

For those who cringe at the thought of Documentary Hypothesis, there is no reason that G-d didn't use multiple styles, or that multiple, divinely-inspired sections were could have been authored by different people.

So why am I writing about this on the blog? Fox reports the following:

Software developed by an Israeli team is giving intriguing new hints about what researchers believe to be the multiple hands that wrote the Bible.

The new software analyzes style and word choices to distinguish parts of a single text written by different authors, and when applied to the Bible its algorithm teased out distinct writerly voices in the holy book.

The program, part of a sub-field of artificial intelligence studies known as authorship attribution, has a range of potential applications -- from helping law enforcement to developing new computer programs for writers. But the Bible provided a tempting test case for the algorithm's creators.

Tomorrow (Monday), the Muqata blog will be interviewing Dr. Moshe Koppel from Bar Ilan University, the computer science professor who headed the research team. Till then read the rest of the Fox article...

Today, scholars generally split the text into two main strands. One is believed to have been written by a figure or group known as the "priestly" author, because of apparent connections to the temple priests in Jerusalem. The rest is "non-priestly." Scholars have meticulously gone over the text to ascertain which parts belong to which strand.

When the new software was run on the Pentateuch, it found the same division, separating the "priestly" and "non-priestly." It matched up with the traditional academic division at a rate of 90 percent -- effectively recreating years of work by multiple scholars in minutes, said Moshe Koppel of Bar Ilan University near Tel Aviv, the computer science professor who headed the research team.

"We have thus been able to largely recapitulate several centuries of painstaking manual labor with our automated method," the Israeli team announced in a paper presented last week in Portland, Oregon, at the annual conference of the Association for Computational Linguistics. The team includes a computer science doctoral student, Navot Akiva, and a father-son duo: Nachum Dershowitz, a Tel Aviv University computer scientist, and his son, Idan Dershowitz, a Bible scholar at Hebrew University in Jerusalem.

The places in which the program disagreed with accepted scholarship might prove interesting leads for scholars. The first chapter of Genesis, for example, is usually thought to have been written by the "priestly" author, but the software indicated it was not.

Similarly, the book of Isaiah is largely thought to have been written by two distinct authors, with the second author taking over after Chapter 39. The software's results agreed that the book might have two authors, but suggested the second author's section actually began six chapters earlier, in Chapter 33.

The differences "have the potential to generate fruitful discussion among scholars," said Michael Segal of Hebrew University's Bible Department, who was not involved in the project.

Over the past decade, computer programs have increasingly been assisting Bible scholars in searching and comparing texts, but the novelty of the new software seems to be in its ability to take criteria developed by scholars and apply them through a technological tool more powerful in many respects than the human mind, Segal said.

Before applying the software to the Pentateuch and other books of the Bible, the researchers first needed a more objective test to prove the algorithm could correctly distinguish one author from another.

So they randomly jumbled the Hebrew Bible's books of Ezekiel and Jeremiah into one text and ran the software. It sorted the mixed-up text into its component parts "almost perfectly," the researchers announced.

The program recognizes repeated word selections, like uses of the Hebrew equivalents of "if," "and" and "but," and notices synonyms: In some places, for example, the Bible gives the word for "staff" as "makel," while in others it uses "mateh" for the same object. The program then separates the text into strands it believes to be the work of different people.

Other researchers have looked at linguistic fingerprints in less sacred texts as a way of identifying unknown writers. In the 1990s, the Vassar English professor Donald Foster famously identified the journalist Joe Klein as the anonymous author of the book "Primary Colors" by looking at minor details like punctuation.

In 2003, Koppel was part of a research team that developed software that could successfully tell, four times out of five, if the author of a text was male or female. Women, the researchers found, are far more likely to use personal pronouns like "she" and "he," while men prefer determiners like "that" and "this" -- women, in other words, talk about people, while men prefer to talk about things. That success sparked debate about how gender shapes the way we think and communicate.

Research of this kind has potential applications for law enforcement, allowing authorities to catch imposters or to match anonymous texts with possible authors by identifying linguistic tics. Because the analysis can also help identify gender and age, it might also allow advertisers to better target customers.

The new software might be used to investigate Shakespeare's plays and settle lingering questions of authorship or co-authorship, mused Graeme Hirst, a professor of computational linguistics at the University of Toronto. Or it could be applied to modern texts: "It would be interesting to see if in more cases we can tease apart who wrote what," Hirst said.

The algorithm might also lead to the creation of a style checker for documents prepared by multiple authors or committees, helping iron out awkward style variations and creating a uniform text, Hirst suggested.

What the algorithm won't answer, say the researchers who created it, is the question of whether the Bible is human or divine. Three of the four scholars, including Koppel, are religious Jews who subscribe in some form to the belief that the Torah was dictated to Moses in its entirety by a single author: God.

For academic scholars, the existence of different stylistic threads in the Bible indicates human authorship.

But the research team says in their paper they aren't addressing "how or why such distinct threads exist."

"Those for whom it is a matter of faith that the Pentateuch is not a composition of multiple writers can view the distinction investigated here as that of multiple styles," they said.

In other words, there's no reason why God could not write a book in different voices.

"No amount of research is going to resolve that issue," said Koppel. (Fox)

Stay tuned for tomorrow's interview here at the Muqata Blog...

Visiting Israel?
Learn to Shoot at Caliber-3 with top Israeli Anti-Terror Experts!

Follow the Muqata on Twitter.

Wherever I am, my blog turns towards Eretz Yisrael טובה הארץ מאד מאד

6 comments:

Quasimodo said...: For a counterpoint, consider that very many of the books in the bible (and their internal stories) are arranged in "chiasmus" form--a kind of parallelism.

(Google "bible, chiasmus", and see what turns up.)

For that sort of complex arrangement to be found, a single author or editor must have been at work.

(Just my 2 cents worth...); 6:43 PM, July 03, 2011
Rabelad said...: I have no idea why people think that a computer program can verify the supposed Multiple Authors theory. Computers are programed on the basis of assumptions and it's no trick to program a computer to give you answers that you want from a text.

The truth is that in the 1800s Wellhausen cooked up this theory and decided that the Torah was written by 5 authors - with the Redactor who supposedly covered up the evidence of the others but botched it once or twice. Then he and his students took his theory to it's (il)logical end. By applying Wellhausen's assumptions and methods they "discovered" not just 4 or 5 authors but some THIRTY. Those who promote this theory like to make it sound reasonable that only a handful of authors wrote the Torah. Supporters of it are either ignorant of this or deliberately hide the fact that his ideas can and were easily taken to the point of absurdity. For if some thirty authors can be discerned by his operating principles and guidelines it means that his assumptions and methods were clearly faulty.; 10:23 PM, July 03, 2011
JLan said...: "I have no idea why people think that a computer program can verify the supposed Multiple Authors theory. Computers are programed on the basis of assumptions and it's no trick to program a computer to give you answers that you want from a text."

The computer can't verify any theories and it can't give any answers. What the computer can do is give you raw data: for example, how often does the word "VaYomer" appear in one passage vs. another passage, how long do certain pesukim tend to be, whether the word "Ki" is used in one section but not in another. Those tendencies, in most texts, would point to different authors: using it in English could, for example, help to verify whether the plays definitely written by "Shakespeare" and the plays thought to have been collaborations are written by the same author or different authors, or when the author Henry James switched from writing his books to dictating them (based on word count and complexity). It's no different than previous ideas, other than verifying that there are statistically significant differences between texts. It can't tell you why those differences exist. Rather, the interpretation is left up to humans.; 6:54 AM, July 04, 2011
Neshama said...: How do you explain this vs the Bible Codes that confirm past events? All of our history and future is contained within the Torah, but only evident 'after' the event. This could be looked at as if we have already been thru it all before, but because we are on earth and subject to the measurement of time, (in replay) it seems like some events are 'future' when they really have occurred again. In this we have 'free choice' to remedy (teshuva) all our actions.

Only our 'talking to HaShem' is in the immediate and actual mode.

This is my analysis after researching (which is not the final word on the subject).; 9:20 AM, July 04, 2011
JLan said...: "How do you explain this vs the Bible Codes that confirm past events?"

1) Because no one's figured out a way to use the Bible Codes to confirm past events.

2) Because the Bible Codes require the text that's become standardized for Ashkenazim and Sephardim. But note that this may not reflect the Masoretic text and certainly wasn't the same text as Rambam used; Temani torahs are slightly different.

3) Because the Bible Codes are really Text Codes; they can be used in any text, Hebrew or otherwise, holy or otherwise, but only ever to "predict" the past. The same thing works just as well in Shakespeare as it does in Psalms.

4) Because the Bible Codes don't produce anything of statistical significance. The advantage of the computer reading, in this case, is that it reports a phenomenon. That phenomenon may be for any number of reasons, but scientifically speaking, the phenomenon is there. The question is why: maybe God felt it would communicate his point better, maybe the form fits with the story and is significant in a literary sense, maybe different authors contributed to the text...but Bible Codes aren't scientific and can't be used in such a manner.; 6:13 PM, July 04, 2011
Anonymous said...: I hope your interview goes well, and he is able to explain to your readers what his program does, and what it doesn't do, and why its so damn awesome.; 9:04 PM, July 04, 2011