GIRT - Mono- and Cross-Language Domain-Specific Information Retrieval  (GIRT4)

Participating groups can query the new GIRT (= German Indexing and Retrieval Testdatabase) collection (GIRT4). This collection of German social science data contains 151319 documents and is available as two parallel corpora which contain the same documents:

In that way of presenting GIRT4 in two different language parts we are offering two parallel corpora: a German corpus and a pseudo-English corpus which is in fact a translation of the German corpus into English and does not contain as much textual information as the original German part. Nevertheless, there are now two distinct parallel corpora in different languages, whereas the previous GIRT3 corpus contained a mix of information fields with German or English content, and in some cases there was no clear distinction within a given field.

As in the last campaigns you can carry out

    1. a monolingual task – German topics against German GIRT data, and now also: English topics against English GIRT data 
    2. a bilingual task – English or Russian topics against German GIRT data, and now also: German or Russian topics against English GIRT data. 
In the case of 2) other topic languages can be used if an independent translation of the topic language is provided.

For expanding the search and/or providing translations a thesaurus is available. In particular, if English is used as query language, an English-German thesaurus is available. If German is used as query language, a German-English thesaurus is available. If Russian is used as query language, a German-Russian translation table is available.

It is allowed to use the topic fields tagged by TIT (= title), DESC (= description), NARR (= narrative). One run with TIT + DESC (without using the document fields CONTROLLED-TERM-DE/EN or METHOD-TERM-DE/EN or FREE-TERM-DE or CLASSIFICATION-TEXT-DE/EN) is mandatory.

Runs including the document fields CONTROLLED-TERM-DE/EN and/or METHOD-TERM- DE/EN and/or FREE-TERM-DE and/or CLASSIFICATION-TEXT-DE/EN must be indicated (as usually for all runs the chosen fields have to be named), they are not counted as "manual" runs.

If you do both monolingual or bilingual tasks with GIRT4-DE and GIRT4-EN you will get back a concordance list of your results in the English and the German corpus after the results are delivered and assessed. Thus, you can compare the results gained on both corpora.

General information on the domain-specific task and the GIRT3 data is given in an article by Gey and Kluck. Additional information on the GIRT4 task, data structure and thesaurus is available here. For any questions on the GIRT task contact Michael Kluck (kluck@bonn.iz-soz.de).

Michael Kluck
Informationszentrum Sozialwissenschaften (IZ)
Bonn, Germany

last revision: 04 December 2002