CLEF 2005 | Ad-Hoc 2005

CLEF 2005 Ad-Hoc Track

 

Collection

The ad-hoc track will test system performance on a multilingual collection of newspaper and news agency documents. The data download page, accessible from the Workspace for Registered Participants indicates precisely which collections you need for each task

Tasks

There will be two core tasks this year testing bilingual (L1->L2) and monolingual non-English information retrieval systems plus experimental multilingual tasks aimed at measuring progress over time in multilingual (L1->Ln) retrieval system performance.

1. Monolingual

The goal is to retrieve relevant documents in Bulgarian, French, Hungarian and/or Portuguese collections using topics in the same language, and to submit results in a ranked list. 

2. Bilingual

The 2005 ad-hoc bilingual track will accept runs for the following source -> target language pairs:

The aim is to retrieve relevant documents from the chosen target collection and submit the results in a ranked list. 
Newcomers only (i.e. groups that have not previously participated in a CLEF cross-language task) can choose to search the English document collection using any topic language.
Any group can submit runs to the English target collection if they use a new or "unusual" topic language (e.g. this year Amharic, Greek, Hungarian and Indonesian qualify).

3. Multilingual

We intend to offer two tasks:
Multi-8 Two-Years-On: This task offers participants the opportunity to carry out the original 2003 Multi-8 task by performing their own retrieval runs and submitting their merged multilingual results. We will be reusing the original 2003 relevance assessments in the results analysis. The aim is to see if we can measure progress over time in multilingual system performance.
Multi-8 Merging Only: The merging strategies explored previously for multilingual retrieval tasks at CLEF and elsewhere have generally produced disappointing results. The aim of this task is to encourage researchers to focus directly on the merging problem. Participants will thus investigate merging algorithms using the provided ranked lists.

The data download page indicates the collections used in the CLEF 2003 Multi-8 task. For both the Multi-8 and the Merging Task we use the 2003 topic sets as follows: C141 - 160 for training purposes and C161 to C200 for the actual test runs to be submitted for evaluation. The topic sets and qrels for 2003 can be found in the Workspace for Registered Participants. Please do NOT use the test set for development - and act as though the qrels for topics C161 to C200 do not exist - otherwise the sense of the exercise will be lost. The 2003 Multi-8 results will be re-analysed for 40 topics (C161-200) to create baselines for comparison. The objectives and organisation of the multilingual tasks are described in a document prepared by Gareth Jones. We are currently finalising details. More information will be circulated soon to participants registered for these tasks.  Anyone interested in participating should contact Gareth (gjones at computing.dcu.ie) with cc to Carol (carol.peters at isti.cnr.it) to be included on the discussion list.

Topics

A common set of 50 topics will be used for both mono- and bilingual tasks and will be found in the Workspace for Registered Participants from 15 March. Topics have been prepared in Amharic, Bulgarian, Chinese, English, French, German, Greek, Hungarian, Indonesian, Italian, Portuguese, Russsian, and Spanish and in other languages on demand. Please contact carol.peters at isti.cnr.it if you are interested in other topic languages.
As stated above the Multilingual task will use CLEF 2003 topics.

Guidelines

Detailed guidelines for participation in the 2005 Ad-hoc track with information on data manipulation, query construction and results submission will be available soon. A preliminary draft of these guidelines can be found in the Workspace for Registered Participants.

 

The track is coordinated jointly by ISTI-CNR, U.Padua, and Dublin City U.