From the Index Card Box to Corpus Lexicography
How does a word get into the dictionary?
The subjectivity of the editor or, show me your dictionary and I’ll show you who you are
Every lexicographer – like every »normal« person – has a certain degree of specialized knowledge and preferences. These are then reflected in the dictionaries. Teams are created n order to avoid an excessive emphasis on subjective interests. There have, however, been cases where dictionaries have been removed from the market because a lexicographer devoted too much space to his pet topic.
Nine-day wonders and long runners – the lexicographer as clairvoyant
New words appear and then practically overnight they’re on the tip of everyone’s tongue. But by the morning after most of them are already yesterday’s news. I personally registered and defined the key words »Mauerspecht« and »Begrüßungsgeld« for a monolingual German dictionary [Translator's note: Both terms were pertinent for the events surrounding German reunification. »Mauerspecht« literally means wall woodpecker and become a common nickname for people chipping off bits of the Berlin Wall. »Begrüßungsgeld« is the official term for the 100 German Marks distributed by the Government to those arriving in West Germany.] When the dictionary was published both words had already been forgotten – I had to »throw them out« again
How does a word get taken out of the dictionary?
Do words have an expiration date?
How do you decide which words are to be purged and which are still »dictionary worthy«? This has also been a really subjective matter up to this point. First, the team discusses the list of words that have become »iffy.« We’ve generally been able to arrive at a consensus regarding dated or even obsolete word usage. However, there hasn’t been an objective criterion for the final deletion of a word from the dictionary.
What’s a »sleeping entry«?
This odd species leads a double life of sorts. On the one hand, they’re words that are no longer used, but have simply been overlooked as candidates for deletion. On the other hand, many »sleeping entries« performs quite another function: lexicographers have been known to »copy« from one another. Respectable publishing houses make up imaginary terms to see if unscrupulous plagiarists then »pick them up.«
The daily struggle with and against the word
Should a firearms license be mandatory for dictionaries?
As simple as it sometimes appears to be, unforeseen difficulties are often connected with dictionary work. Anyone who ignores the special indicatory symbols and contextual references can be in for quite a surprise, especially when trying to express something in a foreign language. For example, the diplomat undoubtedly meant something quite different when at the end of his visit in an English-speaking country he said, »Thank you for the wonderful conception!« After all, the German word »Empfang« has several equivalents in English [Translator's note: »Empfang« can signify both reception and receipt, but is also very close to the German word for conception, »Empfängnis."].
“Political Correctness« and other everyday problems
One of the stickiest problems that a lexicographical workshop faces on a daily basis is the increasingly important topic of »political correctness.« The term is understood as signifying the conscious effort to avoid all discriminatory language. We quite often receive mail from members of minority groups who have found insulting vocabulary in our dictionaries: a Turk residing in Germany considered »getürkt« an insult to the Turkish people as a whole [Translator's Note: the slang term literally means to get »turked« and means the manipulation or falsification of a thing or an event], the Sinti and Roma point out the questionable etymological original of the German word for gypsy, »Zigeuner« or traveling rogues, and so on. We strive to make users aware of the potential dangers associated with words like »Negro« by adding warning notes. A word that might have no stigma for some can have quite a negative impact with other listeners.
What is corpus and what can you do with it?
A corpus is a large collection of language material in electronic form.
Generally texts from newspapers and literary works, but also spoken examples. These collections of texts – we work with a corpus of 300 million words for German – are analyzed by modern computer programs. By comparing the text of a current dictionary with the corpus you can, among other things, compile a list of potential additions and deletions.
Can a dictionary be produced automatically – and would people then be superfluous?
It’s theoretically feasible, but the result of an automatically generated dictionary would be highly questionable. For example, the term »à la carte« doesn’t appear even once in the entire corpus of 300 million words. It would seem to be a classic deletion candidate for this reason. The lexicographers, however, know that it belongs in the dictionary nonetheless and simply ignored the computer’s suggestion.
The limits of a purely automatic corpus analysis
After analyzing a large textual corpus, we were given the suggestion to add »immmer« to the dictionary [Translator's note: The German word »immer« means always and is spelled with two »m"s.] We couldn’t believe that our standard reference work didn’t contain »immer.« It was only after looking at the word more closely that we noticed that it had three »m”s. This spelling appeared in the corpus a total of 132 times….Brave new media world.
Dr. Vincent J. Docherty, who describes himself as a »language practitioner«, heads the dictionary editing department of Langenscheidt Publishing, Munich.
© 1999 Vincent J. Docherty & Adib Fricke, The Word Company.