CLEF 2007 | Ad-Hoc 2007
The ad-hoc track will test system performance on a multilingual collection of newspaper and news agency documents. The data download page, accessible from the Workspace for Registered Participants indicates precisely which collections you need for each task
The ad-hoc track tests mono- and cross-language textual document retrieval. Similarly to last year, the 2007 track offers basic mono- and bilingual tasks plus an experimental task aimed at (but not restricted to) experienced participants.This is the “Robust” task.
The goal is to retrieve relevant documents in Bulgarian, Czech, and/or Hungarian collections using topics in the same language, and to submit results in a ranked list.
The 2007 bilingual task focuses on "new" CLEF languages Bulgarian and Hungarian (added in 2005) and Czech (added this year). The aim is to strengthen the text collections and to see if the system performance achieved can be equivalent to that obtained with more "consolidated" languages in previous years. We also include a new English target collection (LA Times 2002), which, this year can be used with any topic language. However, we particularly encourage experiments with non-European languages against the English target collection.
The 2007 ad-hoc bilingual track will accept runs for the following source -> target language pairs:
Any topic language -> Bulgarian target collection
Any topic language -> Czech target collection
Any topic language -> Hungarian target collection
Any topic language -> English target collection
As always, the
aim is to retrieve relevant documents from the chosen target collection and
submit the results in a ranked list.
We strongly encourage groups that have participated in a cross-language ad-hoc task in previous years, to submit at least one run for each target language.
Topics will be supplied in a variety of European (Bulgarian, Czech, English, French, Hungarian, Spanish, Portuguese) and non-European languages including Amharic, Chinese, Afaan Oromo, Hindi, Telugu and Indonesian. Other languages can be added on demand.
Sets of 50 topics will be used for both mono- and bilingual tasks and
will be found in the Workspace for Registered Participants from 11 April.
carol.peters at isti.cnr.it
if you are interested in other topic languages.
In 2007, another "robust" task will be offered; this task emphasizes the importance of reaching a minimal performance for all topics instead of high average performance.
Robustness is a key issue for the transfer of CLEF research into applications. The robust task will use three languages often used in previous CLEF campaigns (English, French, Portuguese). Additional evaluation measures will be introduced.
The 2007 robust task focuses on target collections for "consolidated" languages for which many experiments have already been made within CLEF (English, French and Portuguese). One bilingual run (English -> French) will be offered.
The robust task intends to evaluate stable performance over all topics instead of high average performance in Mono- and Cross-Language IR (“ensure that all topics obtain minimum effectiveness levels” Voorhees 2005 SIGIR Forum).
The evaluation methodology will use the geometric average as well as the mean average precision of all topics. Geometric average has exhibited a high correlation to MAP at CLEF 2006. Other measures have been suggested. Candidates for measures are:
Data for Robust Task
Problems with inconsistencies between collections and topics should be avoided this year.
§ Each group may submit five runs for each sub-task and each topic language
Contact: Thomas Mandl, University of Hildesheim, Germany firstname.lastname@example.org
The track is coordinated jointly by ISTI-CNR and U.Padua (Italy) and U. Hildesheim (Germany)