Participating groups can query the new GIRT (= German Indexing and Retrieval Testdatabase) collection (GIRT4). This collection of German social science data contains 151319 documents and is available as two parallel corpora which contain the same documents:
In that way of presenting GIRT4 in two different language
parts we are offering two parallel corpora: a German corpus and a pseudo-English
corpus which is in fact a translation of the German corpus into English
and does not contain as much textual information as the original German
part. Nevertheless, there are now two distinct parallel corpora in different
languages, whereas the previous GIRT3 corpus contained a mix of information
fields with German or English content, and in some cases there was no clear
distinction within a given field.
As in the last campaigns you can carry out
For expanding the search and/or providing translations a thesaurus is available. In particular, if English is used as query language, an English-German thesaurus is available. If German is used as query language, a German-English thesaurus is available. If Russian is used as query language, a German-Russian translation table is available.
It is allowed to use the topic fields tagged by TIT
(= title), DESC (= description), NARR (= narrative). One run with TIT +
DESC (without using the document fields CONTROLLED-TERM-DE/EN or METHOD-TERM-DE/EN
or FREE-TERM-DE or CLASSIFICATION-TEXT-DE/EN) is mandatory.
Runs including the document
fields CONTROLLED-TERM-DE/EN and/or METHOD-TERM- DE/EN and/or FREE-TERM-DE
and/or CLASSIFICATION-TEXT-DE/EN must be indicated
(as usually for all runs the chosen fields have to be named), they are not
counted as "manual" runs.
If you do both monolingual or bilingual tasks with
GIRT4-DE and GIRT4-EN you
will get back a concordance list of your results in the English and the German
corpus after the results are delivered and
assessed. Thus, you can compare the results gained on both corpora.
General information on the domain-specific task and the GIRT3 data is given in an article by Gey and Kluck. Additional information on the GIRT4 task, data structure and thesaurus is available here. For any questions on the GIRT task contact Michael Kluck (kluck@bonn.iz-soz.de).
Michael Kluck
Informationszentrum Sozialwissenschaften (IZ)
Bonn, Germany
last revision: 04 December 2002