CLEF 2007 | Ad-Hoc 2007

CLEF 2008 Ad-Hoc Track


There are three very distinct tasks in the 2008 Ad-Hoc Track.

The main task offers monolingual and cross-language search on library catalog records in English, French, and German, organised in collaboration with The European Library (TEL). The second task is an Ad-Hoc retrieval task on a Persian newspaper corpora. The third task is the robust task which this year uses word sense disambiguated (WSD) data.



The task is to search and retrieve relevant items from collections of library catalog cards. Our aim is to identify the most effective retrieval technologies for searching this type of data.

This data is very different from the news corpora previously used in the CLEF ad hoc track, consisting of bibliographic data (document surrogates). Whereas in the traditional ad hoc task, the user searches for a document containing information of interest, here the user will be searching to identify which publications are of potential interest – according to the information provided by the catalog card. The question the user is asking is “Is the publication described by the bibliographic record relevant to my information need?”


The Collections

 The collections have been provided by The European Library ( and the task is organised in collaboration with TEL. Three target collections are provided:



The data is very sparse. many records contain only title, author and subject heading information; other records provide more details. It is important to note that the data is actually multilingual: all collections to a greater or lesser extent contain records pointing to documents in other languages. Thus the title and maybe (if existing) an abstract or description can be in a different language to that understood as the language of the collection. The subject heading information is normally in the main language of the collection.

About 66% of the documents in the English and German collection have textual subject headings, in the French collection 37%.
Dewey Classification (DDC) is: not available in the French collection; negligible (<0.3%) in the German collection; but occurs in about half of the English documents (456,408 docs to be exact)

A description of the data structure will be provided on release of the collections.


The Task

50 topics will be prepared for each of the 3 main collection languages (DE; EN; FR). Topics can be prepared in other languages on demand.

Topics will have 2 fields: Title field – 2-4 key terms; Description field: a sentence specifying the information item of interest


2 main tasks are offered: monolingual and bilingual -  subdivided into subtasks to reflect the multilinguality of the data:

Monolingual; Monolingual++  and  Bilingual; Bilingual++

The ++ tasks are tasks where the participating group also attempts to use additional tools to cater for the multilinguality of the collections. Groups must state whether their runs are to be considered as “++”


More Information about the Task (posted 15 May 2008)

We have tagged the 3 collections (BL, BNF, ONB) as English, French and German because in each case this is the main/official language of the collection. However, as stated above all three collections are to some extent multilingual and contain documents (catalog records) in many additional languages.


In both tasks: Monolingual and Bilingual, the aim is to retrieve documents relevant to the query - and your results are judged in this respect. 

By monolingual we mean that the query is in the same language as the official language of the collection.

By bilingual we mean that the query is in a different language to the official language of the collection.


For example, in an EN -> FR run, relevant documents (bibliographic records) could be any document in the BNF collection (which we call the French collection) in whatever language they are written. The same is true for a monolingual FR -> FR run - relevant documents from the BNF collection could actually also be in English or German, not just French.


Documents referring to all types of works (e.g. books, articles, collections of images, videos, etc.) are judged for relevance unless the query specifically indicates otherwise.


In CLEF2008, the task we simulate is that of a user who has a working knowledge of English, French and German (plus wrt the English collection also Spanish) and who wants to discover the existence of relevant documents that can be useful for him/her in one of our three target collections (either monolingually - the query is in the official language of the collection; or bilingually, the query is in a different language).

We will judge for relevance only those documents that are written totally or partially in one of these languages, e.g. a catalog record written, for example, entirely in Hungarian will be counted as not relevant as it is of no use to our hypothetical user; however, a catalog record with perhaps the title and a brief description in Hungarian, but with subject descriptors in French, German or English will be judged for relevance as it could be potentially useful. Our assessors have no additional knowledge of the documents referred to by the catalog records (or surrogates) contained in the collection. They judge for relevance on the information contained in the records made available to the systems.

The ++ runs will be those runs where the participating system has used additional tools to cater for the multilinguality of these collections (e.g. language identification tools, additional multilingual dictionaries, etc.) It will be interesting to see whether systems that use additional tools have better performance.


One of our suppositions is that, knowing that these collections are to some extent multilingual, some systems may attempt to use specific tools to discover this. For example, a system trying the cross-language English to French task on the BNF target collection but knowing that documents retrieved in English and German will also be judged for relevance might choose to employ an English-German as well as the probable English-French dictionary. This is just a hypothesis - left to the fantasy of the participants. We have no idea whether groups will try anything like this - and have no idea if such strategies can help retrieval performance. We will see. Groups attempting anything of this type are asked to declare such runs with the ++ indication.


Please remember that this is also for us a learning task - our aim is to investigate the best approaches for retrieval from library catalogs, where the information is frequently very sparse and, as we have found, is often stored in unexpected languages. This is in  fact very much a real world task and hopefully will provide useful input for the European Digital Library (now known as Europeana).


Contact: Carol Peters, ISTI-CNR ( or Nicola Ferro, U. Padua (


2. Persian@CLEF

This task is run in collaboration with the Database Research Group of the University of Tehran.  It will use the Hamshahri corpus of 1996-2002 newspapers. A very complete description can be found on the Hamshahri website. Monolingual and bilingual (EN - > FA) tasks will be offered. We intend to make both training and test topics available. More information soon. Contact Abolfazl AleAhmad ( or Hadi Amiri (, DBRG, University of Tehran.


3. Robust WSD Task @ CLEF 2008

The robust task will bring semantic and retrieval evaluation together. The participants will be offered topics and document collections from previous CLEF campaigns which were annotated by systems for word sense disambiguation (WSD). The goal of the task is to test whether WSD can be used beneficially for retrieval systems.

The organizers believe that polysemy is among the reasons for information retrieval (IR) systems to fail. WSD could allow a more targeted retrieval. Last year, the campaigns SemEval and CLEF cooperated and created a task where participants were required to provide WSD on CLEF data collections. In a retrieval experiment by the organizers the WSD data was used for retrieval but did not lead to improvement. This year, participants are given the WSD data (or can derive their own) and can run their own retrieval experiments with various retrieval strategies.

The WSD data is based on WordNet version 1.6 and will be supplemented with data from the English and Spanish WordNets in order to test different expansion strategies. Several leading WSD experts will run their systems, and provide those WSD results for the participants to use.

Participants are required to submit at least one baseline run without WSD and one run using the WSD data. They can submit four further baseline runs without WSD and four runs using WSD with in various ways.

The robust task will use two languages often used in previous CLEF campaigns (English, Spanish). Documents will be in English, and topics in both English and Spanish.

A subset of highly ambiguous topics will be identified by the organizers and used for a separate evaluation to see how WSD works for these hard topics.

The evaluation will be based on Mean Average Precision (MAP) as well as Geometric Average Precision (GMAP). The robust measure GMAP intends to evaluate stable performance over all topics instead of high average performance in Mono- and Cross-Language IR (“ensure that all topics obtain minimum effectiveness levels” Voorhees 2005 SIGIR Forum).

Data for Robust Task

§         Ad-hoc collections which were available at CLEF 2001

§         LA Times 94 (with WSD data)

§         Glasgow Herald 95 (with WSD data)

§         Topics (with WSD data)

§         2001-2002,2004: for Training

§         2003, 2005-2006: for Testing

§         Tasks

§         monolingual IR (English)

§         bilingual (Spanish -> English)


Data Collections for the Robust Task








LA Times 94




LA Times 94




LA Times 94

Glasgow Herald 95




Glasgow Herald 95



LA Times 94

Glasgow Herald 95



LA Times 94

Glasgow Herald 95


only for 2002



Test and Training Data for Robust 2008


Time Schedule

·        Registration Opens - 20 February 2008

·        Data Release - from 1 April 2008

·        Topic Release - from 1 May 2008

·        Submission of Runs by Participants. 15 June 2008

·        Release of Relevance Assessments and Individual Results - 15 July 2008

·        Submission of Paper for Working Notes - 15 August 2008

·        Workshop - 17-19 September 2008


Contact: Thomas Mandl, University of Hildesheim, Germany, or Eneko Agirre, University of the Basque Country, Spain,