CLEF 2003 | Agenda
The objective of CLEF 2003 was to test different aspects of mono- and cross-language information retrieval system performance. There were six main tracks and two pilot experiments.
The multilingual track has been offered in CLEF for several years now. It entails searching a multilingual collection of newspaper and news agency documents. Using a selected topic language, the goal for systems is to retrieve relevant documents for all languages in the collection, rather than just a given pair, listing the results in a merged, ranked list.
In CLEF 2003, there were two distinct tasks in this track:
The document collection for Multilingual-4 contained English, French, German and Spanish documents. Multilingual-8 will involved searching a collection containing documents in eight languages: Dutch, English, Finnish, French, German, Italian, Spanish and Swedish.
A common set of topics (i.e. structured statements of information needs from which queries are extracted) was prepared in ten languages: Dutch, English, Finnish, French, German, Italian, Spanish, Swedish, Russian and Chinese.
The 2003 bilingual track was very different from that of 2002. A main objective was to encourage the tuning of systems running on challenging language pairs that do not include English. We also wanted to ensure comparability of results. For this reason, runs were only accepted for one of more of the following source -> target languages:
Newcomers only (ie groups that have not previously participated in a CLEF cross-language task) could choose to search the English document collection using a European topic language. Russian was also included as a target collection in the bilingual task.
The CLEF experience has demonstrated the importance of monolingual system performance for multiple languages as a first step towards cross-language work. CLEF 2003 offered tasks for Dutch, Finnish, French, German, Italian, Russian, Spanish and Swedish.
Mono- and Cross-Language Information Retrieval for Scientific Collections
This track used a new much larger GIRT (German Indexing and Retrieval Testdatabase) collection: GIRT4
This collection of German social science data contains 151319 documents and is available as two parallel corpora which contain the same documents:
· German GIRT4 (GIRT4-DE)
· English GIRT4 (GIRT4-EN)
Monolingual and bilingual tasks were offered. Controlled vocabularies in German-English and German-Russian were available. Click here for more information.
The coordinator of this track was Michael Kluck, IZ-Bonn (email@example.com)
This track was a great success in CLEF 2002. Participating teams used a common experiment design to explore interactive formulation of cross-language queries and/or cross-language document selection. Details on the tasks to be offered in CLEF 2003 were posted on the iCLEF Web site. The track coordinators were Julio Gonzalo, UNED, Madrid, Spain, and Douglas Oard, University of Maryland, USA. Contact Julio Gonzalo (firstname.lastname@example.org) for more information.
A new track for CLEF2003: it offered tasks to test monolingual and cross-language question answering systems. The languages involved were Dutch, Italian and Spanish in the monolingual tasks and Dutch, French, German, Italian and Spanish source language queries to an English target document collection in the bilingual task.
The track was organised as follows:
Coordination: Bernardo Magnini - ITC-irst, Trento, Italy (email@example.com)
· Monolingual Dutch: Maarten de Rijke - University of Amsterdam, The Netherlands
· Monolingual Italian: Bernardo Magnini - ITC-irst, Trento, Italy
· Monolingual Spanish: Anselmo Peņas - UNED, Madrid, Spain
· Cross-language: Donna Harman, NIST, USA
Two pilot experiments involving multi-media collections will also be hosted by CLEF.
The aim of this track was to test the effectiveness of systems designed to retrieve relevant images on the basis of their captions in a multilingual context. It was coordinated by the University of Sheffield.
Those interested can view the proposal presented at the CLEF 2002 Workshop. For further details, see the website http://ir.shef.ac.uk/imageclef/index.html or contact Mark Sanderson (firstname.lastname@example.org).
This track aimed at the evaluation of CLIR systems on noisy automatic transcripts of spoken documents, and the low-cost development of a benchmark.
Preliminary experiments were already conducted in 2002 as an activity of the DELOS Network of Excellence. They were reported at the CLEF 2002 Workshop in Rome.
The track was coordinated by Marcello Federico - ITC-irst, Trento, Italy, and Gareth Jones, University of Exeter, UK. For further details, see the website (http://munst.itc.it/clef-sdr.html) or contact Marcello directly (email@example.com)