Welcome to Cross Language Evaluation Forum

CLEF 2009 | Ad-Hoc | Guidelines

Guidelines for Participation in CLEF 2009 Ad-Hoc Track (Preliminary version)

In these Guidelines, we provide information on the CLEF 2009 test collections, tasks, data manipulation, query construction and results submission for the Ad-Hoc tracks.

There are three very distinct tasks in the 2009 Ad-Hoc Track.

The main task offers monolingual and cross-language search on library catalog records in English, French, and German, organised in collaboration with The European Library (TEL@CLEF). The second task is an Ad-Hoc retrieval task on a Persian newspaper corpus (Persian@CLEF). The third task is the robust task (Robust-WSD) which which aims at assessing whether word sense disambiguated (WSD) data does impact on IR system performance..

TEL@CLEF

2 tasks are offered: monolingual and bilingual:
Monolingual; Monolingual+ and Bilingual; Bilingual+
The + tasks are tasks where the participating group also attempts to use additional tools to cater for the multilinguality of the collections. Groups must state whether their runs are to be considered as “+”.

The objective is to query the selected target collection using topics in the same language (monolingual run) or topics in a different language (bilingual run) and to submit the results in a list ranked in decreasing order of relevance.

The topic sets for these two tasks consist of 50 topics and are prepared in: English, French, German. Other topic languages can be offered if requested.

The bilingual runs can consist of any topics in any language to either the English, French or German target collection.

Conditions for participation: All groups submitting bilingual tasks must also submit one base monolingual run in the target language(s) chosen.

Persian@CLEF

This task is run in collaboration with the Database Research Group of the University of Tehran. It uses the Hamshahri corpus of 1996-2002 newspapers. A very complete description can be found on the Hamshahri website. Monolingual and bilingual (EN - > FA) tasks will be offered. Training and test topics will be made available. The objective is to query the target collection using topics in the same language (monolingual run) or topics in English (bilingual run) and to submit the results in a list ranked in decreasing order of relevance.

Robust WSD

The robust task will bring semantic and retrieval evaluation together. The participants will be offered topics and document collections from previous CLEF campaigns which were annotated by systems for word sense disambiguation (WSD). The goal of the task is to test whether WSD can be used beneficially for retrieval systems.

The robust task will use two languages often used in previous CLEF campaigns (English, Spanish). Documents will be in English, and topics in both English and Spanish.

Full details on this task can be found at http://ixa2.si.ehu.es/clirwsd/.

CONSTRUCTING AND MANIPULATING THE SYSTEM DATA STRUCTURES FOR THE AD-HOC TRACK

1. The system data structures may not be modified in response to CLEF 2009 topics. For example, you cannot add topic words that are not in your dictionary. The CLEF tasks represent the real-world problem of an ordinary user posing a question to a system. In the case of the cross-language tasks, the question is posed in one language and relevant documents must be retrieved whatever the language in which they have been written. If an ordinary user could not make the change to the system, you should not make it after receiving the topics.

3. Only the following fields may be used for automatic retrieval:

LA TIMES 1994: HEADLINE, TEXT only
LA TIMES 2002: HD, LD, TE only
Glasgow Herald: HEADLINE, TEXT only
TELBL: all fields
TELBNF: all fields
TELONB: all fields
Hamshahri: TEXT only

Learning from (e.g. building translation sources from) such fields is permissible.

GUIDELINES FOR CONSTRUCTING THE QUERIES

The queries are constructed from the topics. Topics for the TEL task consist of 2 fields: Title field – 2-4 key terms; Description field: one or two sentences specifying the information item of interest.. For the Persian and Robust-WSD tasks, each topic consists of three fields: a brief title statement; a one-sentence description; a more complex narrative specifying the relevance assessment criteria. Queries can consist of 1 or more of these fields.

There are many possible methods for converting the supplied topics into queries that your system can execute. We have broadly defined two generic methods, "automatic" and "manual", based on whether manual intervention is used or not. When more than one set of results are submitted, the different sets may correspond to different query construction methods, or if desired, can be variants within the same method.

The manual query construction method includes both runs in which the queries are constructed manually and then run without looking at the results AND runs in which the results are used to alter the queries using some manual operation. The distinction is being made here between runs in which there is no human involvement (automatic query construction) and runs in which there is some type of human involvement (manual query construction). It is clear that manual runs should be appropriately motivated in a CLIR context, e.g. a run where a proficient human simply translates the topic into the document language(s) is not what most people think of as cross-language retrieval.

To further clarify this, here are some example query construction methodologies, and their correct query construction classification. Note that these are only examples; many other methods may be used for automatic or manual query construction.

1. queries constructed automatically from the topics, the retrieval results of these queries sent to the DIRECT results server --> automatic query construction
2. queries constructed automatically from the topics, then expanded by a method that takes terms automatically from the top 30 documents (no human involved) --> automatic query construction
3. queries constructed manually from the topics, results of these queries sent to the DIRECT results server --> manual query construction
4. queries constructed automatically from the topics, then modified by human selection of terms suggested from the top 30 documents --> manual query construction

Note that by including all types of human-involved runs in the manual query construction method we make it harder to do comparisons of work within this query construction method. Therefore groups are strongly encouraged to determine what constitutes a base run for their experiments and to do these runs (officially or unofficially) to allow useful interpretations of the results. For those of you who are new to CLEF, unofficial runs are those not submitted to CLEF but evaluated using the trec_eval package (latest version) written by Chris Buckley for TREC. (See previous years' CLEF papers for examples of use of base runs.)

WHAT TO DO WITH YOUR RESULTS

Your results must be sent to the DIRECT results server (address to be communicated), respecting the submission deadlines (see below).
Results have to be submitted in ASCII format, with one line per document retrieved.
The lines have to be formatted as follows:

10.2452/451-AH	Q0	document.00072	0	0.017416	runidex1
1	2	3	4	5	6

The fields must be separated by ONE blank and have the following meanings:

1) Query identifier. Please use the complete DOI identifier of the topic (e.g. 10.2452/451-AH, not only 451)
INPUT MUST BE SORTED NUMERICALLY BY QUERY NUMBER.

2) Query iteration (will be ignored. Please choose "Q0" for all experiments).

3) Document number (content of the <DOCNO> tag.).

4) Rank 0..n (0 is best matching document. If you retrieve 1000 documents per query, rank will be 0..999, with 0 best and 999 worst). Note that rank starts at 0 (zero) and not 1 (one).
MUST BE SORTED IN INCREASING ORDER PER QUERY.

5) RSV value (system specific value that expresses how relevant your system deems a document to be. This is a floating point value. High relevance should be expressed with a high value). If a document D1 is considered more relevant than a document D2, this must be reflected in the fact that RSV1 > RSV2. If RSV1 = RSV2, the documents may be randomly reordered during calculation of the evaluation measures. Please use a decimal point ".", not a comma. Do not use any form of separators for thousands. RSV values must NOT be negative numbers. The only legal characters for the RSV values are 0-9 and the decimal point.
MUST BE SORTED IN DECREASING ORDER PER QUERY.

6) Run identifier (please chose an unique ID for each experiment you submit). Only use a-z, A-Z and 0-9. No special characters, accents, etc.

The fields are separated by a single space.
The file contains nothing but lines formatted in the way described above.
You are expected to retrieve 1000 documents per query. An experiment that retrieves a maximum of 1000 documents each for 20 queries therefore produces a file that contains a maximum of 20000 lines.

You should know that the effectiveness measures used in CLEF evaluate the performance of systems at various points of recall. Participants must thus return at most 1000 documents per query in their results. Please note that by its nature, the average precision measure does not penalize systems that return extra irrelevant documents at the bottom of their result lists. Therefore, you will usually want to use the maximum number of allowable documents in your official submissions. If you knowingly retrieved less than 1000 documents for a topic, please take note of that and check your numbers with those reported by the system during the submission.

You will have to submit each run through the DIRECT system. An E-mail will be sent to you explaining how to submit your results.

N.B. Please read the following very carefully

TEL Tasks: Bilingual: We accept up to a maximum of 4 runs per language pair.

TEL Tasks: Monolingual: We will also accept a maximum of 4 runs per language for the monolingual task (there are three languages to choose from).

Persian Tasks: We accept up to a maximum of 4 runs for both monolingual and bilingual tasks (a total of no more than 8 runs)

Robust tasks: Participants are required to submit at least one baseline run without WSD and one run using the WSD data. They can submit four further baseline runs without WSD and four runs using WSD in various ways.

In all of the above tasks, in order to facilitate comparison between results, there must be a mandatory run: Title + Description (per experiment, per topic language).

The absolute deadline for submission of results for all Ad-Hoc tasks is midnight (24.00) Central European Time, Tuesday, 9 June. Detailed information on how and where to submit your results will be communicated shortly.

An input checker program, used by TREC and modified to meet the requirements of CLEF, can be accessed here. (ready shortly)

WORKING NOTES

A clear description of the strategy adopted and the resources you used for each run MUST be given in your paper for the Working Notes. The deadline for receipt of these papers is 30 August 2009. The Working Notes will be distributed to all participants on registration at the Corfu Workshop (30 September - 2 October 2009). This information is considered of great importance; the point of the CLEF activity is to give participants the opportunity to compare system performance with respect to variations in approaches and resources. Groups that do not provide such information risk being excluded from future CLEF experiments.

------------------------------------------------------------------------

Current update is 12 May 2009.