CLEF 2006 | Ad-Hoc 2006

CLEF 2006 Ad-Hoc Track



The ad-hoc track will test system performance on a multilingual collection of newspaper and news agency documents. The data download page, accessible from the Workspace for Registered Participants indicates precisely which collections you need for each task


The ad-hoc track tests mono- and cross-language textual document retrieval. Similarly to last year, the 2006 track offers basic mono- and bilingual tasks plus an experimental multilingual task aimed at (but not restricted to) experienced participants.This is the Robust” task.

1. Monolingual

The goal is to retrieve relevant documents in Bulgarian, French, Hungarian and/or Portuguese collections using topics in the same language, and to submit results in a ranked list. 

2. Bilingual

The 2006 bilingual task focuses on target collections for "consolidated" languages for which many experiments have already been made within CLEF (French and Portuguese) and "new" CLEF languages (Bulgarian and Hungarian - added in 2005). In CLEF we note that system performance tends to be best with target languages for which a strong test collection has been built over the years. The aim for the "consolidated" languages is thus to see if system performance can be further improved compared with previous years (using the monolingual results as base-line), whereas with the "new" languages the aim is to strengthen the text collections and to see if the system performance achieved can be equivalent to that obtained with the "consolidated" languages. The 2006 ad-hoc bilingual track will accept runs for the following source -> target language pairs:

The aim is to retrieve relevant documents from the chosen target collection and submit the results in a ranked list. 
However, we strongly encourage groups that have participated in a cross-language ad-hoc task in previous years, to submit at least one run for each target language.

In addition, this year we also offer a bilingual task aimed at encouraging system testing with non-European languages against the English or French target collections. Topics will be supplied in a variety of languages including Amharic, Chinese, Afaan Oromo, Hindi, Telugu and Indonesian. Other languages can be added on demand. The aim is to stimulate the development of resources to handle these languages in a cross-language context.

Finally, newcomers only (i.e. groups that have not previously participated in a CLEF cross-language task) can choose to search the English document collection using any topic language.

Topics for Tasks 1 and 2

Sets of 50 topics will be used for both mono- and bilingual tasks and will be found in the Workspace for Registered Participants from 15 March. As the target collections are from two different time periods (1994-95 for the French, Portuguese and English collections and 2002 for the Hungarian and Bulgarian), two main topic sets will be prepared. 25 topics will be common to both sets while 25 topics will be collection-specific.   This means that a total of 75 topics will be prepared in many different languages (European and non-European); participants will select the required topic set according to the target be used. Topic languages envisaged are Bulgarian, English, French, German, Hungarian, Italian, Portugese, Russian and Spanish plus the non-European languages listed above. Please contact carol.peters at if you are interested in other topic languages.

3. Robust

The new robust task emphasizes stable performance over all topics instead of high average performance in Mono-, Cross-Language and Multilingual IR (“ensure that all topics obtain minimum effectiveness levels” Voorhees 2005 SIGIR Forum).

The robust task is essentially an ad-hoc task which makes use of test collections previously developed at CLEF. The data collections containes six languages (Dutch, English, German, French, Italian and Spanish) and a set of 160 topics.

The evaluation methodology will use the geometric average as well as the mean average precision of all topics. Geometric average has proven to be a stable measure for robustness at TREC.

In the long term, we are interested in topic difficulty, failure analysis for hard topics and a stable performance of systems over different tasks.


Purpose: Investigate Robustness in Cross Language and Multilingual IR


  • Stable performance over all topics instead of high average performance (like at TREC, but for CLIR)

  • Emphasize performance for hard topics

  • Stable performance over all topics for multi-lingual retrieval

The robust task is an ad-hoc task which makes use of test collections developed at CLEF and which applies a modified evaluation methodology.

Data and Task Design for Robust Task

  •     Ad-hoc collections which were available at CLEF 2001 through CLEF 2003

  •     Six languages: Dutch, English, German, French,   Italian and Spanish (EN, FR, DE, ES, IT, NL). The exact collections are indicated on the Data Collections for CLEF 2006 page in the Workspace for Registered Participants.


160 topics (2001 – 2003).

    - English and Spanish topic sets have been made available. other languages on demand.



  •  Monolingual (for all six document languages)

  • Three bilingual (Italian -> Spanish; French -> Dutch; English -> German

  • Multilingual (all six languages are allowed as topic language) 4 runs per group are allowed.





Not Allowed

Participants are encouraged to run their systems with the same setup for all roubust tasks in which they participate (except for language specific resources).
Participants are encouraged to run their systems with the same setup and the same parameters for the CLEF mono- and bilingual Ad-Hoc tasks (except for language specific resources)

For all issues not specified in this document, the CLEF Guidelines for Participation apply. They give information on constructing and manipulating the system data structures, constructing the queries as well as assembling the results.

Evaluation Methodology
  • The results of the submitted runs will be reported based on the geometric average (as calculated by the trec_eval program) as well as the mean average precision. The main evaluation result is based on the geometric average. At TREC, geometric average (rather than mean average) turned out to be the most stable evaluation method for robustness.
  • The organizers may report the results based on other evaluation measures as well as for sub-sets of topics.



Document Collections

  • English: LA Times 94, GH 95

  • French: ATS (SDA) 94/95, Le Monde 94

  • Italian: La Stampa 94, AGZ (SDA) 94/95

  • Dutch: NRC Handelsblad 94/95, Alg. Dagblad 94/95

  • German: Fr Rundschau 94/95, Spiegel 94/95, SDA 94

  • Spanish: EFE 94/95

160 Topics used at CLEF 2001, 2002 and 2003


Training topics

Test topics

50-59, 70-79, 100-109, 120-129,

150-159, 180-189

All other topics between

numbers 41 and 200


Contact: Thomas Mandl, University of Hildesheim, Germany

The track is coordinated jointly by ISTI-CNR and U.Padua (Italy) and U. Hildesheim (Germany)