Welcome to Cross Language Evaluation Forum

CLEF Agenda for 2002

Task Description

There were five evaluation tracks in CLEF 2002, testing different aspects of mono- and cross-language information retrieval system performance.

Multilingual Information Retrieval

The main track in CLEF2002 required searching a multilingual document collection for relevant documents. Using a selected topic language, the goal was to retrieve documents for all languages in the collection, rather than just a given pair, listing the results in a merged, ranked list.

The CLEF 2002 document collection for this track contains English, German, French, Italian and Spanish documents. A common set of topics (i.e. structured statements of information needs from which queries are extracted) was prepared in twelve languages: Dutch, English, Finnish, French, German, Italian, Spanish, Swedish,Russian, Portuguese, Japanese and Chinese.

CLEF 2002 also offered a series of additional tracks designed to test different aspects of information retrieval system development.

Bilingual Information Retrieval

In the bilingual track, any topic language could be used to search target document collections in Dutch, Finnish, French, German, Italian, Spanish or Swedish. First-time CLEF participants only could choose to search the English document collection using a European topic language.

Monolingual (non-English) Information Retrieval

Until recently, most IR system evaluation focused on English. CLEF provides the opportunity for monolingual system testing and tuning, and for building test suites in other European languages. CLEF 2002 offered tasks for Dutch, Finnish, French, German, Italian, Spanish and Swedish.

Mono- and Cross-Language Information Retrieval for Scientific Collections

This track offered two distinct tasks:

AMARYLLIS: System performance in searching a multi-disciplinary scientific database of approximately 150,000 French bibliographic documents was studied. Tools were provided that could be used in the retrieval task (a controlled vocabulary in English and French). The task was coordinated by Patrick Kremer and Laurent Schmitt, INIST-CNRS, France.

GIRT: This task was based on the GIRT collection which contains nearly 80,000 German social science documents in a structured database. A German/English/Russian thesaurus and English translations of the document titles were available. The rationale for this task is to study CLIR in a vertical domain (i.e. social science). The task was coordinated by Michael Kluck, IZ-Bonn, Germany.

Interactive Cross-Language Information Retrieval

A special interest interactive track was offered again this year. Participating teams used a common experiment design to explore interactive formulation of cross-language queries and/or cross-language document selection. The coordinators were Julio Gonzalo, UNED, Madrid, Spain, and Douglas Oard, University of Maryland, USA. For details, see the iCLEF Web site.