CLEF 2003 | Agenda

CLEF Agenda for 2003


Task Description

The objective of CLEF 2003 was to test different aspects of  mono- and cross-language information retrieval system performance. There were six main tracks and two pilot experiments. 

Multilingual Information Retrieval

The multilingual track has been offered in CLEF for several years now. It entails searching a multilingual collection of newspaper and news agency documents. Using a selected topic language, the goal for systems is to retrieve relevant documents for all languages in the collection, rather than just a given pair, listing the results in a merged, ranked list.

In CLEF 2003, there were two distinct tasks in this track:


The document collection for Multilingual-4 contained English, French,  German and Spanish documents. Multilingual-8 will involved searching a collection containing documents in eight languages: Dutch, English, Finnish, French,  German, Italian, Spanish and Swedish. 

A common set of topics (i.e. structured statements of information needs from which queries are extracted) was prepared in ten languages: Dutch, English, Finnish, French, German, Italian, Spanish, Swedish, Russian and Chinese.


Bilingual Information Retrieval

The 2003 bilingual track was very different from that of 2002. A main objective was to encourage the tuning of systems running on challenging language pairs that do not include English. We also wanted to ensure comparability of results. For this reason, runs were only accepted  for one of more of the following source -> target languages:

Newcomers only (ie groups that have not previously participated in a CLEF cross-language task) could choose to search the English document collection using a European topic language.  Russian was also included as a target collection in the bilingual task.


Monolingual (non-English) Information Retrieval

The CLEF experience has demonstrated the importance of monolingual system performance for multiple languages as a first step towards cross-language work. CLEF 2003 offered tasks for Dutch, Finnish, French, German, Italian, Russian, Spanish and Swedish.


Mono- and Cross-Language Information Retrieval for Scientific Collections

This track used a new much larger GIRT (German Indexing and Retrieval Testdatabase) collection: GIRT4

This collection of German social science data contains 151319 documents and is available as two parallel corpora which contain the same documents:

·        German GIRT4 (GIRT4-DE)

·        English GIRT4 (GIRT4-EN)

Monolingual and bilingual tasks were offered. Controlled vocabularies in German-English and German-Russian were available. Click here for more information.

The coordinator of this track was Michael Kluck, IZ-Bonn (


Interactive Cross-Language Information Retrieval (iCLEF)

This track was a great success in CLEF 2002.  Participating teams used a common experiment design to explore interactive formulation of cross-language queries and/or cross-language document selection.  Details on the tasks to be offered in CLEF 2003 were posted on the iCLEF Web site. The track coordinators were Julio Gonzalo, UNED, Madrid, Spain, and Douglas Oard, University of Maryland, USA. Contact Julio Gonzalo ( for more information.

Multiple Language Question Answering (QA at CLEF)

A new track for CLEF2003: it offered tasks to test monolingual and cross-language question answering systems. The languages involved were Dutch, Italian and Spanish in the monolingual tasks and Dutch, French, German, Italian and Spanish source language queries to an English target document collection in the bilingual task.

The track was organised as follows:

Coordination: Bernardo Magnini - ITC-irst, Trento, Italy (

·         Monolingual Dutch: Maarten de Rijke - University of Amsterdam, The Netherlands

·         Monolingual Italian: Bernardo Magnini - ITC-irst, Trento, Italy

·         Monolingual Spanish: Anselmo Peņas  - UNED, Madrid, Spain

·         Cross-language:  Donna Harman, NIST, USA

Those interested can view the proposal presented at the CLEF 2002 Workshop. For further details, see the website ( or contact Bernardo directly.

Pilot Experiments

Two pilot experiments involving multi-media collections will also be hosted by CLEF. 

Cross-Language Retrieval in Image Collections (Image CLEF)

The aim of this track was to test the effectiveness of systems designed to retrieve relevant images on the basis of their captions in a multilingual context. It was coordinated by the University of Sheffield.

Those interested can view the proposal presented at the CLEF 2002 Workshop. For further details, see the website or  contact Mark Sanderson (

Cross-Language Spoken Document Retrieval (CL-SDR)

This track aimed at the evaluation of CLIR systems on noisy automatic transcripts of spoken documents, and the low-cost development of a benchmark.

Preliminary experiments were already conducted in 2002 as an activity of the DELOS Network of Excellence. They were reported at the CLEF 2002 Workshop in Rome.

The track was coordinated by Marcello Federico - ITC-irst, Trento, Italy, and Gareth Jones, University of Exeter, UK.  For further details, see the website ( or contact Marcello directly (