The Borgmann Project: Listing all the Words in English

Google Tech Talks August 13, 2008 ABSTRACT Contrary to popular belief, current unabridged dictionaries contain only a small fraction of all the words in English. This is primarily because it costs money for a publisher to include a word in a dictionary, and therefore the publisher includes only words that will be “valuable” in some sense to the dictionary purchaser. As publishing moves increasingly into electronic form, the cost of including a word in a dictionary decreases. This leads to the possibility of creating a dictionary containing all the words in English. There are a number of unexpected difficulties involved in this effort, and this talk will summarize the current state of the art. Speaker: Chris Cole Chris Cole is President of Ur Studios, Inc. He was introduced to the Arpanet as a student at Harvard in the 1970’s. As a graduate student at Caltech he co-wrote with Stephen Wolfram the first commercially available symbolic math package, SMP, which was a precursor to Mathematica. Cole co-founded in 1981 Inference Corporation, which grew to be the leading AI company, later acquired by eGain. Cole also wrote the software for the online version of Merriam-Webster’s Collegiate Dictionary, worked with the Advanced Technology Group of Encyclopedia Britannica in implementing Britannica Online, and edits the archive for the Usenet newsgroup rec.puzzles. In 1999 Sterling published his book Wordplay, A Curious Dictionary of Language Oddities.
