DGD - Welcome
The DGD is the Database for Spoken German ("Datenbank für Gesprochenes Deutsch"). To use the DGD, you need to register (it's free). The DGD's user interface is in German. We are sorry we cannot provide a localized interface for other languages. This page provides some basic information on the DGD in English.
The DGD gives registered users access to 40 corpora of spoken language from the Archive for Spoken German ("Archiv für Gesprochenes Deutsch", AGD). The corpora comprise:
- The Research and Teaching Corpus of Spoken German ("Forschungs- und Lehrkorpus Gesprochenes Deutsch", FOLK), a state-of-the-art corpus of spontaneous interaction data
- The GeWiss Corpus ("Gesprochene Wissenschaftssprache Kontrastiv") of academic speech
- Further interaction corpora, such as the Freiburger Korpus ("FR") and the corpus Dialogstrukturen ("DS")
- The large "historic" dialect corpora of German, most importantly the corpus German dialects ("Deutsche Mundarten", "Zwirner-Korpus", ZW) and its "satellite corpora" German dialects in Eastern Europe (OS), German dialects in the Black Forest region (SV), German dialects in south-west Germany (SW), German dialects in the GDR (DR)
- Other influential variation corpora for German, such as the corpus Basic German ("Deutsche Umgangssprachen", "Pfeffer-Korpus", PF) and the corpus Standard German ("Deutsche Standardsprache", "König-Korpus", KN), as well as the more recent corpus Deutsch Heute ("DH")
- Corpora on extra-territorial varieties of German ("speech islands") such as Michael Clyne's corpus on Australian German, a corpus on German in Russia, a corpus on German in Namibia and a corpus on Mennonite Low German in the Americas
- Anne Betten's corpora on the German of Emigrants to Israel ("Emigrantendeutsch in Israel", IS, ISW, ISZ)
- Norbert Dittmar's corpus on German reunification ("Berliner Wendekorpus", BW)
Altogether, the DGD contains approx. 5,000 hours of audio and video recordings, and more than 20 million transcribed tokens. With a few exceptions, all transcriptions in the database are time-aligned with the recordings and annotated with lemma and part-of-speech information.
|Browsing: Metadata||Browsing: Transcripts||Query: Metadata||Query: Transcripts|
The DGD provides functionality for
- browsing corpus data: reading metadata and transcripts, listening to audio, viewing video, accessing additional material.
- querying corpus data: full text searches, systematic queries on different annotation levels, KWIC concordancing, virtual corpus compilation from metadata queries, quantification and export of query results
- downloading corpus data: selected full data sets, arbitrary transcript and recording excerpts
Corpus objects are interlinked on many levels so that users can, for instance, display metadata or playback audio for a given search result or transcript display.
To learn more about the DGD and the corpora it contains (especially the FOLK corpus), please consult the following publications:
- Thomas Schmidt (2014): The Database for Spoken German – DGD2. In: Proceedings of the Ninth conference on International Language Resources and Evaluation (LREC’14), Reykjavik, Iceland: European Language Resources Association (ELRA), 1451-1457.
- Thomas Schmidt (2017): Construction and Dissemination of a Corpus of Spoken Interaction - Tools and Workflows in the FOLK project. In: Corpus Linguistic Software Tools, Journal for Language Technology and Computational Linguistics (JLCL 31/1), by Kupietz, Marc & Geyken, Alexander (Hrsg.), 127-154.
- Thomas Schmidt (2016): Good practices in the compilation of FOLK, the Research and Teaching Corpus of Spoken German. In: International Journal of Corpus Linguistics, Volume 21, Issue 3, Jan 2016, p. 396 - 418
- Thomas Schmidt (2014): The research and teaching corpus of spoken German – FOLK In: Proceedings of the Ninth conference on International Language Resources and Evaluation (LREC’14), Reykjavik, Iceland: European Language Resources Association (ELRA), 383-387.
- Dolores Lemmenmeier-Batinić (2020): Lexical Explorer: extending access to the Database for Spoken German for user-specific purposes In: Corpora 15 (1), 55-76.
Feel free to contact us at email@example.com with any questions you might have.