CLEF AGENDA for 2001

Task Description

There will be three main evaluation tracks in CLEF 2001, testing multilingual, bilingual and monolingual (non-English) information retrieval systems. Interested groups can participate in any one or in all three tracks. Newcomers to the activity may well choose to begin with the monolingual track in the first year and work up to the others in later years. There will also be a special sub-task for domain-specific cross-language evaluation and, possibly, an experimental track testing interactive cross-language systems.

  1. Multilingual Information Retrieval

    The main task in CLEF 2001 requires searching a multilingual document collection for relevant documents. This year’s multilingual collection contains English, German, French, Italian and Spanish documents. Using a selected topic (query) language, the goal is to retrieve documents for all languages in the collection, rather than just a given pair, listing the results in a merged, ranked list.

    The official topic languages for CLEF 2001 will be English, French, German, Italian, Spanish, Dutch and Japanese. However, it is expected that topics will also be made available in additional languages, such as Finnish, Greek, Russian and Swedish.

  2. Bilingual Information Retrieval

    Many IR groups are now just beginning to work on retrieval over pairs of languages; the bilingual task gives them the chance to participate officially in the CLEF activity. In response to a strong demand, CLEF 2001 will offer 2 distinct bilingual tracks. Similarly to the previous year, the first consists in querying a document collection of English texts in any of the other available topic languages. However, in order to assist participants who want to test their systems on a less familiar language, a second task will provide the opportunity to query a Dutch document collection, again using any other topic language. A stopword list and stemmer for Dutch plus a small Dutch-English bilingual lexicon will be made available to CLEF participants to assist them in this task.

  3. Monolingual (non-English) Information Retrieval

    It is often asserted that procedures for monolingual information retrieval are (almost) completely language independent. This is not however true; different languages present different problems. Methods that may be highly efficient for certain language typologies may not be so effective for others. So far, most IR system evaluation has focussed on English. We will provide the opportunity for monolingual system testing and tuning and build up test suites in other European languages (Dutch, French, German, Italian and Spanish in CLEF 2001).

  4. Domain-Specific Mono- and Cross-Language Information Retrieval

    In addition to the three main tasks, there is a special task for CLEF 2001. This task is based on a data collection from a vertical domain (social sciences): the GIRT collection. This collection contains nearly 80,000 German documents in a structured database. The rationale of the task is to study CLIR in a vertical domain where a German/English/Russian thesaurus and English translations of the document titles are available. Topics will be made available in English, German and Russian.

  5. Interactive Cross-Language Information Retrieval

    The goal of the interactive track at CLEF 2001 is to explore evaluation methods for interactive CLIR and to establish baselines against which future research progress can be measured. The track will most likely focus on interactive selection of documents that have been automatically translated from a language that the searcher would otherwise have been unable to read. The details of the task and the evaluation design will be developed though discussions on the interactive track mailing list. For more information, please click here.

Resources

The CLEF test collection for 2001 consists of SGML formatted newspaper and news agency documents for English, French, German, Italian, Spanish and Dutch from the same time period. CLEF participants will have free access to a multilingual test suite (documents, topics, and relevance assessments) for research purposes.

Participation

There is no strict deadline for registration. Those wishing to take part in CLEF 2001 are requested to send an e-mail as soon as possible to Carol Peters, indicating in which task(s) they intend to participate. Participants will be requested to sign an agreement restricting the use of the data and regulating publication and dissemination of results.

For further information on the procedure for participation and copies of the data release forms, please click here.

Important Dates

Data Release - 1 March 2001
Topic Release - from 9 April 2001
Receipt of results from participants - 10 June 2001
Release of relevance assessments and individual results - 25 July 2001
Submission of paper for Working Notes - 6 August 2001
Workshop and Working Notes - 3-4 September 2001

Workshop

A two-day Workshop will be held on 3-4 September in Darmstadt, Germany, immediately before the fifth European Conference on Digital Libraries (ECDL 2001).

The aim of the Workshop will be to present and discuss the results of the CLEF activity and allow researchers and developers to compare performance between systems using different cross-language strategies.

Contact Information

For further information and to be included in the mailing list, contact:

Carol Peters - IEI-CNR
Area della Ricerca di San Cataldo, 56100 PISA (Italy)
Tel: +39 050 315 2897 - Fax: +39 050 315 2810
E-mail: carol@iei.pi.cnr.it