CLEF 2003 |
Agenda
Task Description
The
objective of CLEF 2003
was to test different aspects of mono- and cross-language
information retrieval system performance. There
were six main tracks and two
pilot experiments.
The multilingual track has been offered in CLEF for several years now.
It entails searching a multilingual collection of newspaper and news agency
documents. Using a selected topic language, the goal for systems is to retrieve
relevant documents for all languages in the collection, rather than just a
given pair, listing the results in a merged, ranked list.
In CLEF 2003, there
were two distinct tasks in this track:
The document collection for Multilingual-4
contained English,
French, German and Spanish documents. Multilingual-8 will involved
searching a collection containing documents in eight languages: Dutch, English,
Finnish, French, German, Italian, Spanish and Swedish.
A common set of topics (i.e. structured statements of information needs from which queries are extracted) was prepared in ten languages: Dutch, English, Finnish, French, German, Italian, Spanish, Swedish, Russian and Chinese.
The 2003 bilingual track
was very different from that of 2002. A
main objective was to encourage the tuning of systems running on
challenging language pairs that do not include English. We also wanted to ensure
comparability of results. For this reason, runs
were only accepted for
one of more of the following source -> target languages:
Newcomers only (ie groups that have not previously participated in a CLEF cross-language task) could choose to search the English document collection using a European topic language. Russian was also included as a target collection in the bilingual task.
The CLEF experience has demonstrated the importance of monolingual system performance for multiple languages as a first step towards cross-language work. CLEF 2003 offered tasks for Dutch, Finnish, French, German, Italian, Russian, Spanish and Swedish.
Mono- and Cross-Language Information Retrieval for Scientific
Collections
This track used a new much larger GIRT (German Indexing and Retrieval Testdatabase) collection: GIRT4
This collection of German social science data contains 151319 documents
and is available as two parallel corpora which contain the same documents:
·
German GIRT4 (GIRT4-DE)
·
English GIRT4 (GIRT4-EN)
Monolingual and bilingual tasks
were offered. Controlled vocabularies
in German-English and German-Russian
were available. Click here for more
information.
The coordinator of this track
was Michael Kluck, IZ-Bonn (kluck@bonn.iz-soz.de)
This track was a great success in CLEF 2002. Participating teams
used a common experiment design to explore interactive formulation of
cross-language queries and/or cross-language document selection. Details
on the tasks to be offered in CLEF 2003
were posted on the iCLEF Web site. The track
coordinators were Julio Gonzalo, UNED, Madrid, Spain, and Douglas Oard,
University of Maryland, USA. Contact Julio Gonzalo (julio@lsi.uned.es) for more
information.
A new track for CLEF2003: it offered tasks to test monolingual and cross-language question answering systems. The languages involved were Dutch, Italian and Spanish in the monolingual tasks and Dutch, French, German, Italian and Spanish source language queries to an English target document collection in the bilingual task.
The track
was organised as follows:
Coordination: Bernardo Magnini - ITC-irst, Trento, Italy
(magnini@itc.it)
·
Monolingual Dutch: Maarten de Rijke - University of Amsterdam, The
Netherlands
·
Monolingual Italian: Bernardo Magnini - ITC-irst, Trento, Italy
·
Monolingual Spanish: Anselmo Peņas - UNED, Madrid, Spain
· Cross-language: Donna Harman, NIST, USA
Those interested can view the proposal presented at the
CLEF 2002 Workshop. For further details, see the website (http://clef-qa.itc.it/) or contact
Bernardo directly.
Two
pilot experiments involving multi-media collections will also be hosted by
CLEF.
The aim
of this track was
to test the effectiveness of systems designed to retrieve
relevant images on the basis of their captions in a multilingual context. It
was coordinated by the University of Sheffield.
Those
interested can view the proposal presented at the
CLEF 2002 Workshop. For further details, see the website
http://ir.shef.ac.uk/imageclef/index.html or
contact Mark Sanderson (m.sanderson@sheffield.ac.uk).
This
track aimed at the evaluation of CLIR systems on noisy automatic transcripts of
spoken documents, and the low-cost development of a benchmark.
Preliminary experiments were already conducted
in 2002 as an activity of the DELOS Network of Excellence. They were reported
at the CLEF 2002 Workshop in Rome.
The track
was coordinated by Marcello Federico - ITC-irst, Trento, Italy, and Gareth
Jones, University of Exeter, UK. For further
details, see the website (http://munst.itc.it/clef-sdr.html) or contact Marcello directly (federico@itc.it)