CLEF 2005 | How to Participate
In order to participate in CLEF 2005 and have access to the test collections, a registration form and the relevant data release forms must be first compiled, signed, and sent to Carol Peters at the address below (except for the web collection - see following). On receipt of the forms, you will be sent information on how to download the data. For copyright reasons, we have 3 separate sets of data release forms: one for the US newspaper data, one for the EuroGOV web collection, and one for all other document collections (text, image and speech). The EuroGOV forms are downloadable from the WebCLEF website and should be compiled and sent to the WebCLEF coordinators at the University of Amsterdam. The others are accessible here below.
Some
information is repeated on the different forms. The reason for this is that
they will be kept on different sites. All forms must be signed by a person authorised
by your organisation for such signatures (e.g. Department or Administrative
Head or similar).
Please note that previous CLEF participants must resubmit End-User Agreements
each year in order to renew their authorisation to access the data covered by
this agreement.
Please compile the forms carefully, inserting all relevant information. Read the task descriptions in the Agenda for CLEF 2005 to see which data collections you should request - this depends on the tasks you will be performing. Remember that access to the data will only be provided when we have received signed, original copies of the forms (not electronic or fax versions).
The Registration Form is your statement of intention to participate in CLEF 2005. All participating groups must submit this form. Please fill it in carefully, providing full details of the tasks in which you intend to participate and the languages you will be using. The following table is meant as a guide to help you in identifying which target collections you will need. If you are unsure, please contact either Carol Peters (carol.peters@isti.cnr.it) or the coordinators of the particular track(s) in which you are interested. Registration remains open until 30 April 2005.
TARGET COLLECTION |
|||||||||||||||||||
TRACK |
BG |
DE |
EN (US/British) |
ES |
FI |
FR |
IT |
HU |
NL |
PT |
Russian Not-used |
SW |
GIRT-4 |
RSSC |
St Andrews |
Image-CLEFmed |
IRMA |
Malach Corpus |
Euro GOV |
MULTILINGUAL: Multi-8 - 2 years on (Merging only - No data needed-see task desc.) |
|
Frankfurt.R, Der Spiegel & SDA (94/95) |
LA Times, Glasgow Herald (94/95) |
EFE (94/95) |
Aamulehti (94/95) |
Le Monde 94 only; SDA 94/95 |
La Stampa 94;SDA 94/95 |
NRC Handelsblad, Algemeen Dagblad (94/95) |
TT (94/95) |
|
|
|
|
||||||
BILINGUAL (choice of collection depends on task) |
Sega;Standart (2002) |
As above: Newcomers only |
|
LeMonde 94/95; SDA 94/95 |
Magyar Hirlap (2002) |
Público 94/95; Folha94/95 to confirm |
|
|
|
|
|
||||||||
MONOLINGUAL (collection depends on task) |
Sega;Standart (2002) |
LeMonde 94/95; SDA 94/95 |
Magyar Hirlap (2002) |
Público 94/95; Folha94/95 to confirm |
|
|
|
|
|||||||||||
Domain-Specific mono- and bilingual tasks |
|
|
|
|
|
|
|
|
|
DE&EN |
(to be confirmed) |
|
|
||||||
iCLEF (to be decided by track - check with coordinators) |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
QAatCLEF (choice of collection depends on task) |
Sega;Standart (2002) |
Frankfurt.R, Der Spiegel & SDA (94/95) |
LA Times, Glasgow Herald (94/95) |
EFE (94/95) |
Aamulehti 94/95 |
LeMonde 94; SDA 94/95 |
La Stampa 94;SDA 94/95 |
|
NRC Handelsblad, Algemeen Dagblad (94/95) |
Público 94/95; Folha94/95 to confirm |
|
|
|
|
|
||||
IMAGE-CLEF (depending on task) |
|
|
|
|
|
|
|
|
|
|
|
X |
X |
X |
|||||
CL-SR |
|
|
|
|
|
|
|
|
|
|
|
X |
|||||||
WebCLEF |
|
|
|
|
|
|
|
|
|
|
|
|
X |
||||||
GeoCLEF (choice of collection depends on task) |
|
Frankfurt.R, Der Spiegel & SDA (94/95) |
LA Times, Glasgow Herald (94/95) |
EFE (94/95) |
|
|
|
|
|
|
|
|
|
|
The End-User Agreement must be submitted in two original signed copies. It Even if you have already participated in a CLEF campaign and have access to some of the data, you must have this form (in two copies) compiled and signed by the appropriate person in your organisation in order to be authorised to continue to use the data for CLEF 2005. You should indicate on this form only the data sets you need, depending on the tasks that you intend to perform. On receipt of the two copies of this form, they will both be signed on behalf of CLEF, and one will be kept in our archives and the other will be returned to you.
The conditions to use the Los Angeles Times collection are unaltered from previous campaigns. You only need to complete these Data Agreement forms if you have not already been granted access to this data. Please note that only the Organisation Application must be sent to me; the Individual Application is to be kept by your organisation.
The
CLEF document collection consists of SGML or XML-formatted documents from national
newspapers, journals and news agencies from the same time period (1994
and 1995), as follows:
Dutch: NRC Handelsblad - 1994, 1995; Algemeen Dagblad - 1994, 1995
English: Los Angeles Times - 1994; The Herald -1995
Finnish: Aamulehti - 1995 (+small amount of 94 data);
French: Le Monde - 1994, 1995; SDA French Swiss news agency data - 1994,
1995
German: Frankfurter Rundschau - 1994, 1995; Der Spiegel - 1994;
SDA German Swiss news agency data - 1994, 1995
Italian: La Stampa - 1994; SDA Italian Swiss news agency data - 1994,
1995
Portuguese: Público (Portuguese newspaper)- 1994, 1995; Folha - 1994;
1995 (Brazilian newspaper - under negotiation)
Russian: Izvestia - 1995
Spanish: Agencia EFE S.A. (Spanish news agency data) - 1994, 1995
Swedish: Tidningarnas Telegrambyrå -1994, 1995
plus
the following newspapers for 2002
Bulgarian:
Sega - 2002; Standart - 2002
Hungarian: Magyar Hirlap - 2002
Domain-Specific
Collection
GIRT4 (German
Indexing and Retrieval Testdatabase)
RSSC (Russian Social Science Corpus) (under negotiation)
Image
Collections
St Andrews University Library
Historical
Photographic Collection
ImageCLEFmed
Radiological Medical Database
IRMA Database of Medical Images
Web Collection
EuroGOV - web documents crawled from European governmental sites - late
2004
Spoken
Documents
MALACH Collection spontaneous
conversational speech from the Shoah archives
When you have submitted the Registration and Data Release forms you will be given the necessary passwords to access the data and participation guidelines in the Workspace for Registered Participants.
The topic/question sets will be made accessible in the Workspace for Registered Participants, from 15 March on. Topics will be prepared in a number of languages depending on the task. Please refer to track/task descriptions. Other languages may be added depending on demand.
Acknowledgments
We gratefully acknowledge the support of all the data providers and copyright holders, and in particular:
The Los Angeles Times, for the American-English data collection;
SMG Newspapers (The Herald) for the British-English data collection
Le Monde S.A. and ELDA: Evaluations and Language resources Distribution Agency, for the French data.
Frankfurter Rundschau, Druck und Verlagshaus Frankfurt am Main; Der Spiegel, Spiegel Verlag, Hamburg, for the German newspaper collections.
InformationsZentrum Sozialwissen-schaften, Bonn, for the GIRT database
SocioNet system for the Russian Social Science Corpora
Hypersystems Srl, Torino and La Stampa, for the Italian data.
Agencia EFE S.A. for the Spanish data.
NRC Handelsblad, Algemeen Dagblad and PCM Landelijke dagbladen/Het Parool for the Dutch newspaper data.
Aamulehti Oyj and Sanoma Osakeyhtiö for the Finnish newspaper data
Russika-Izvestia for the Russian newspaper data
Público, Portugal, and Linguateca for the Portuguese (PT) newspaper collection
Folha, Brazil, and Linguateca for the Portuguese (BR) newspaper collection
Tidningarnas Telegrambyrå (TT) SE-105 12 Stockholm, Sweden for the Swedish newspaper data
Schweizerische Depeschenagentur, Switzerland, for the French, German and Italian Swiss news agency data
Ringier Kiadoi Rt. [Ringier Publishing Inc.].and the Research Institute for Linguistics, Hungarian Acad. Sci. for the Hungarian newspaper documents
Sega AD, Sofia; Standart Nyuz AD, Sofia, and the BulTreeBank Project, Linguistic Modelling Laboratory, IPP, Bulgarian Acad. Sci, for the Bulgarian newspaper documents
St Andrews University Library for the historic photographic archive
University and University Hospitals, Geneva, Switzerland and Oregon Health and Science University for the ImageCLEFmed Radiological Medical Database
Aachen University of Technology (RWTH), Germany for the IRMA database of annotated medical images.
The Survivors of the Shoah Visual History Foundation,
and IBM for the Malach spoken document collection
Without their help, this evaluation activity would be impossible.
Please mail all forms by express or priority post to the address below. Access to the data will be provided only on receipt of the relevant forms.
Carol Peters - CLEF Coordinator
ISTI-CNR
Area della Ricerca CNR
Via Moruzzi, 1 - 56124 Pisa, Italy
Fax: +39 050 315 3464/2810 Tel: +39 050 315 2897