CLEF 2009 offered a series of evaluation tracks to test different aspects of cross-language information retrieval system development. The aim was to promote research into the design of user-friendly, multilingual, multimodal retrieval systems.
There were 8 main evaluation tracks in 2009
This track tested mono- and cross-language text retrieval. Tasks in 2009 tested both CL and IR aspects in a multilingual context.
TEL@CLEF evaluated retrieval algorithms on multilingual collections of catalog records; as in 2008, the collections are derived from the English, French and German archives of The European Library;
Persian@CLEF focused on linguistic aspects, offering mono- and bilingual retrieval exercises on a Persian test collection of news documents
Robust-WSD aimed at assessing whether word sense disambiguated data does impact on system performance; mono- and bilingual tasks were offered on an English WSD collection.
The track was coordinated by ISTI-CNR (IT) & U.Padua (IT), U. Tehran (IR), and U. Basque Country (ES). For information, see here.
Interactive retrieval of images using the Flickr database (www.flickr.com) was again studied. Flickr is a dynamic, image database with labels provided by creators and viewers in a self-organizing ontology of tags. This labeling activity is naturally multilingual, reactive, and cooperative. The focus is on measuring relevance, user confidence/satisfaction and user behaviour on a large scale. To serve this purpose, a single multilingual interface to Flickr was used by all participants. Coordinators were UNED (ES), SICS (SE) & U. Sheffield (UK). See http://nlp.uned.es/iCLEF for details.
QA@CLEF 2009 proposed three separate exercises: ResPubliQA, QAST and GikiCLEF:
ResPubliQA: Given a pool of 500 independent natural language questions, systems must return the passage (not the exact answer) that answers each question from the JRC-Acquis collection of EU documentation. Both questions and documents are translated and aligned for a subset of languages (at least Bulgarian, Dutch, English, French, German, Italian, Portuguese, Romanian and Spanish). See http://celct.isti.cnr.it/ResPubliQA/
QAST: The aim of the third QAST exercise was to evaluate QA technology in a real multilingual speech scenario in which written and oral questions (factual and definitional) in different languages are formulated against a set of audio recordings related to speech events in those languages. The proposed scenario was European Parliament sessions in English, Spanish and French. See http://www.lsi.upc.edu/~qast/2009/
GikiCLEF: Following the previous GikiP pilot at GeoCLEF 2008, the task focused on open list questions over Wikipedia that require geographic reasoning, complex information extraction, and cross-lingual processing. See http://www.linguateca.pt/GikiCLEF/
This track evaluated retrieval from visual collections; both text and visual retrieval techniques were exploitable.A number of tasks were offered
multilingual ad-hoc retrieval from a photo collection concentrating on diversity in the results;
retrieval from a large scale, heterogeneous collection of Wikipedia images with user-generated textual metadata, and queries in several languages;
medical image retrieval (with visual, semantic and mixed topics in several languages);
medical image classification (exact task to be defined);
detection of semantic categories from robotic images (non-annotated collection, concepts to be detected). Results of a visual and a text retrieval system will be made available to participants.
Track coordinators were U. Sheffield (UK), U. Applied Sciences Western Switzerland (CH), Oregon Health and Science U. (US), RWTH Aachen (DE), U. Geneva (CH), CWI (NL), IDIAP (CH). For details see: http://www.imageclef.org/
INFILE (information filtering evaluation) extended the TREC 2002 filtering track as follows: it uses a corpus of 100,000 Agence France Press comparable newswires for Arabic, English and French; Evaluation is performed using an automatic querying of test systems with a simulated user feedback. Each system can use the feedback at any time to increase performance. Test systems provide boolean decisions for each document and filter profile. A curve of the evolution of efficiency is computed along with more classical measures tested in TREC. INFILE was also open to monolingual participation. Coordinators were CEA (FR), U. Lille (FR) , ELDA (FR). See http://www.infile.org/
VideoCLEF offered classification and retrieval tasks on a video collection containing episodes of dual language television programming. The collection extended the Dutch/English corpus used for the 2008 VideoCLEF pilot track. Task participants were provided with speech recognition transcripts, metadata and shot-level keyframes for the video data. Two classification tasks were offered: "Subject Classification", which involves automatically tagging videos with subject labels, and "Affect and Appeal", which involves classifying videos according to characteristics beyond their semantic content. A semantic keyframe extraction task and an exercise on identifying related English-language resources to support viewer comprehension of Dutch-language video was also planned. The track was coordinated by Dublin City University (IE) and Delft University of Technology (NL). See http://www.cdvp.dcu.ie/VideoCLEF/
The CLEF IP track in 2009 utilized a collection of more than 1M patent documents mainly derived from EPO sources, the collection included English, French and German with at least 100,000 documents in each language. Queries and relevance judgements were produced by two methods. The first used queries produced by Intellectual Property Experts and reviewed by them in a fairly conventional way. The second was an automatic method using patent citations from seed patents. Search results were reviewed to ensure the majority of test and training queries produce results in more than one language. In 2009 we kept to the Cranfield evaluation model: in subsequent years we expect to offer refined retrieval process models and assessment tools.
The track was coordinated by: Information Retrieval Facility & Matrixware (AT) See www.ir-facility.org/the_irf/current-projects/clef-ip09-track/
LogCLEF dealt with the analysis of queries as expression of user behavior. The goal was the analysis and classification of queries in order to improve search systems. LogCLEF had two tasks:
Log Analysis and Geographic Query Identification (LAGI): The recognition of the geographic component within a query stream is a key problem for geographic information retrieval (GIR). Geographic queries require specific treatment and often a geographically oriented output (e.g. a map). The task was to (1) classify geographic queries and (2) identify their geographic and non-geographic elements. A real search engine log file and logs from The European Library (TEL) were used.
Log Analysis for Digital Societies (LADS): This task used logs from The European Library (TEL) to analyze user behavior with a focus on multilingual search. Potential targets are query reformulation, multilingual search behavior and community identification.
The coordinators were: U. Hildesheim (DE), U. Padua (IT), Mitre Corp. (US). See http://www.uni-hildesheim.de/logclef/