Welcome to Cross Language Evaluation Forum

CLEF 2009 | Ad-Hoc 2009

CLEF 2009 Ad-Hoc Track

Page under development.

DESCRIPTION OF TASKS

GUIDELINES FOR PARTICIPATION

The 2009 Ad Hoc track is to a large extent a repetition of last year's track, with the same three tasks: Tel@CLEF; Persian@CLEF, and Robust-WSD.. The aim is to create good reusable test collections for each of them.

The main task offers monolingual and cross-language search on library catalog records in English, French, and German, organised in collaboration with The European Library (TEL). The second task focuses more on linguistic issues, offering retrieval on test collections in languages thta pose processing challenges. . The third task is the robust task which aims at assessing whether word sense disambiguated (WSD) data does impact on IR system performance.

1. TEL@CLEF
Objective

The task is to search and retrieve relevant items from collections of library catalog cards. Our aim is to identify the most effective retrieval technologies for searching this type of data.

This data is very different from the news corpora previously used in the CLEF ad hoc track, consisting of bibliographic data (document surrogates). Whereas in the traditional ad hoc task, the user searches for a document containing information of interest, here the user will be searching to identify which publications are of potential interest – according to the information provided by the catalog card. The question the user is asking is “Is the publication described by the bibliographic record relevant to my information need?”

The Collections

The collections have been provided by The European Library (www.theeuropeanlibrary.org) and the task is organised in collaboration with TEL. Three target collections are provided:

TEL Catalog records in English. Data provided by The European Library; Copyright British Library (BL)
TEL Catalog records in French. Data provided by The European Library; Copyright Bibliothèque nationale de France (BnF)
TEL Catalog records in German. Data provided by The European Library; Copyright Austrian National Library (ONB)

We have tagged the 3 collections (BL, BNF, ONB) as English, French and German because in each case this is the main/official language of the collection. However, all three collections are to some extent multilingual and contain documents (catalog records) in many additional languages. Thus the title and maybe (if existing) an abstract or description can be in a different language to that understood as the language of the collection. The subject heading information is normally in the main language of the collection.

The data is also very sparse. many records contain only title, author and subject heading information; other records provide more details.

About 66% of the documents in the English and German collection have textual subject headings, in the French collection 37%.
Dewey Classification (DDC) is: not available in the French collection; negligible (<0.3%) in the German collection; but occurs in about half of the English documents (456,408 docs to be exact).

The Task

50 topics have been prepared for each of the 3 main collection languages (DE; EN; FR). Topics can be prepared in other languages on demand. We expect to have topics available also in Chinese, Greek and Polish.

Topics have 2 fields: Title field – 2-4 key terms; Description field: a sentence specifying the information item of interest

2 main tasks are offered: monolingual and bilingual - subdivided into subtasks to reflect the multilinguality of the data:

Monolingual; Monolingual+ and Bilingual; Bilingual+

The + tasks are tasks where the participating group also attempts to use additional tools to cater for the multilinguality of the collections. Groups must state whether their runs are to be considered as “+”

In both tasks: Monolingual and Bilingual, the aim is to retrieve documents relevant to the query - and your results are judged in this respect.

By monolingual we mean that the query is in the same language as the official language of the collection.

By bilingual we mean that the query is in a different language to the official language of the collection.

For example, in an EN -> FR run, relevant documents (bibliographic records) could be any document in the BNF collection (which we call the French collection) in whatever language they are written. The same is true for a monolingual FR -> FR run - relevant documents from the BNF collection could actually also be in English or German, not just French.

Documents referring to all types of works (e.g. books, articles, collections of images, videos, etc.) are judged for relevance unless the query specifically indicates otherwise.

Similarly to CLEF2008, in CLEF2009 the task we simulate is that of a user who has a working knowledge of English, French and German and who wants to discover the existence of relevant documents that can be useful for him/her in one of our three target collections (either monolingually - the query is in the official language of the collection; or bilingually, the query is in a different language).

We will judge for relevance only those documents that are written totally or partially in one of these languages, e.g. a catalog record written, for example, entirely in Hungarian will be counted as not relevant as it is of no use to our hypothetical user; however, a catalog record with perhaps the title and a brief description in Hungarian, but with subject descriptors in French, German or English will be judged for relevance as it could be potentially useful. Our assessors have no additional knowledge of the documents referred to by the catalog records (or surrogates) contained in the collection. They judge for relevance on the information contained in the records made available to the systems.

The + runs are those runs where the participating system has used additional tools to cater for the multilinguality of these collections (e.g. language identification tools, additional multilingual dictionaries, etc.) It will be interesting to see whether systems that use additional tools have better performance.

We were somewhat disappointed last year because only a few groups really attempted to address the specificity of this data; most groups just submitted runs using their favourite (CL)IR approach.

For this reason, we highly recommend that participants in this task do try to implement specific strategies to cater for both the sparseness and multilinguality of the data. We would like to see submissions from groups which include a base-line run, plus additional runs in which different strategies have been attempted

The aim of this task is to investigate the best approaches for retrieval from library catalogs, where the information is frequently very sparse and, as we have found, is often stored in unexpected languages. This is in fact very much a real world task and provide useful input for the European Digital Library (now known as Europeana).

Contact: Carol Peters, ISTI-CNR (carol.peters@isti.cnr.it) or Nicola Ferro, U. Padua (ferro@dei.unipd.it)

2. Persian@CLEF

This task is run in collaboration with the Database Research Group of the University of Tehran. It will use the Hamshahri corpus of 1996-2002 newspapers. A very complete description can be found on the Hamshahri website. Monolingual and bilingual (EN - > FA) tasks will be offered. Last year's topics are available as training topics. The objective is to query the target collection using topics in the same language (monolingual run) or topics in English (bilingual run) and to submit the results in a list ranked in decreasing order of relevance. Contact Abolfazl AleAhmad (a.aleahmad@ece.ut.ac.ir) or Hadi Amiri (h.amiri@ece.ut.ac.ir), DBRG, University of Tehran.

3. Robust WSD Task @ CLEF 2008

The robust task will bring semantic and retrieval evaluation together. The participants will be offered topics and document collections from previous CLEF campaigns which were annotated by systems for word sense disambiguation (WSD). The goal of the task is to test whether WSD can be used beneficially for retrieval systems.

The WSD data is based on WordNet version 1.6 and will be supplemented with data from the English and Spanish WordNets in order to test different expansion strategies. Several leading WSD experts will run their systems, and provide those WSD results for the participants to use.

Participants are required to submit at least one baseline run without WSD and one run using the WSD data. They can submit four further baseline runs without WSD and four runs using WSD with in various ways.

The robust task will use two languages often used in previous CLEF campaigns (English, Spanish). Documents will be in English, and topics in both English and Spanish.

A subset of highly ambiguous topics will be identified by the organizers and used for a separate evaluation to see how WSD works for these hard topics.

The evaluation will be based on Mean Average Precision (MAP) as well as Geometric Average Precision (GMAP). The robust measure GMAP intends to evaluate stable performance over all topics instead of high average performance in Mono- and Cross-Language IR (“ensure that all topics obtain minimum effectiveness levels” Voorhees 2005 SIGIR Forum).

Data for Robust Task

§ Ad-hoc collections which were available at CLEF 2001

§ LA Times 94 (with WSD data)

§ Glasgow Herald 95 (with WSD data)

§ Topics (with WSD data)

§ 2001-2002,2004: for Training

§ 2003, 2005-2006: for Testing

§ Tasks

§ monolingual IR (English)

§ bilingual (Spanish -> English)

Data Collections for the Robust Task

CLEF Year	Topics No.	English
2001	41-90	LA Times 94	x
2002	91-140	LA Times 94	x
2003	141-200	LA Times 94	Glasgow Herald 95
2004	201-250	x	Glasgow Herald 95
2005	251-300	LA Times 94	Glasgow Herald 95
2006	301-350	LA Times 94	Glasgow Herald 95
2007	only for 2002

Test and Training Data for Robust 2008

Time Schedule

· Registration Opens - 20 February 2008

· Data Release - from 1 April 2008

· Topic Release - from 1 May 2008

· Submission of Runs by Participants. 15 June 2008

· Release of Relevance Assessments and Individual Results - 15 July 2008

· Submission of Paper for Working Notes - 15 August 2008

· Workshop - 17-19 September 2008

Contact: Thomas Mandl, University of Hildesheim, Germany, mandl@uni-hildesheim.de or Eneko Agirre, University of the Basque Country, Spain, e.agirre@ehu.es