CLEF 2009 |

CLEF 2009 - AGENDA

 

CLEF 2009 offers a series of evaluation tracks to test different aspects of cross-language information retrieval system development. The aim is to promote research into the design of user-friendly, multilingual, multimodal retrieval systems.

 

There are 8 main evaluation tracks in 2009

Multilingual Document Retrieval (Ad-Hoc)

This track tests mono- and cross-language text retrieval. Tasks in 2009 will test both CL and IR aspects in a multilingual context.

 

The track is coordinated by ISTI-CNR (IT) & U.Padua (IT), U. Tehran (IR), and U. Basque Country (ES). For more information, see here.

Interactive Cross-Language Retrieval (iCLEF)

Interactive retrieval of images using the Flickr database (www.flickr.com) will again be studied. Flickr is a dynamic, image database with labels provided by creators and viewers in a self-organizing ontology of tags. This labeling activity is naturally multilingual, reactive, and cooperative. The focus is on measuring relevance, user confidence/satisfaction and user behaviour on a large scale. To serve this purpose, a single multilingual interface to Flickr will be user by all participants. Coordinators are UNED (ES), SICS (SE) & U. Sheffield (UK). See http://nlp.uned.es/iCLEF for details.

Multiple Language Question Answering (QA@CLEF)

QA@CLEF 2009 proposes three separate exercises: ResPubliQA, QAST and GikiCLEF:

 

The track is coordinated by CELCT (IT) and UNED (ES).  The central website is http://nlp.uned.es/clef-qa/

Cross-Language Image Retrieval (ImageCLEF)

This track evaluates retrieval from visual collections; both text and visual retrieval techniques are exploitable. Five challenging tasks are foreseen:

 

Track coordinators are U. Sheffield (UK), U. Applied Sciences Western Switzerland (CH), Oregon Health and Science U. (US), RWTH Aachen (DE), U. Geneva (CH), CWI (NL), IDIAP (CH). See also: http://www.imageclef.org/

 

Multilingual Information Filtering (INFILE@CLEF)

 

INFILE (information filtering evaluation) extends the TREC 2002 filtering track as follows: it uses a corpus of 100,000 Agence France Press comparable newswires for Arabic, English and French; Evaluation is performed using an automatic querying of test systems with a simulated user feedback. Each system can use the feedback at any time to increase performance. Test systems will provide boolean decisions for each document and filter profile. A curve of the evolution of efficiency will be computed along with more classical measures tested in TREC. INFILE is also open to monolingual participation. Coordinators are CEA (FR), U. Lille (FR) , ELDA (FR). See http://www.infile.org/

Cross-Language Video Retrieval (VideoCLEF)

 

VideoCLEF offers classification and retrieval tasks on a video collection containing episodes of dual language television programming. The collection will extend the Dutch/English corpus used for the 2008 VideoCLEF pilot track. Task participants will be provided with speech recognition transcripts, metadata and shot-level keyframes for the video data. Two classification tasks will be evaluated: "Subject Classification", which involves automatically tagging videos with subject labels, and "Affect and Appeal", which involves classifying videos according to characteristics beyond their semantic content. A semantic keyframe extraction task and an exercise on identifying related English-language resources to support viewer comprehension of Dutch-language video are also planned. The track is coordinated by Dublin City University (IE) and Delft University of Technology (NL). See http://www.cdvp.dcu.ie/VideoCLEF/

Intellectual Property (CLEF-IP) – New this Year

 

The CLEF IP track in 2009 will utilize a collection of more than 1M patent documents mainly derived from EPO sources, the collection will cover English French and German with at least 100,000 documents in each language. Queries and relevance judgements will produced by two methods. The first is using queries produced by Intellectual Property Experts and reviewed by them in a fairly conventional way. The second is an automatic method using patent citations from seed patents. Search results will be reviewed to ensure the majority of test and training queries produce results in more than one language. We will primarily report results retrieving across all three languages. In 2009 we will stick to the Cranfield evaluation model:  in subsequent years we expect to offer refined retrieval process models and assessment tools.

The track is coordinated by:  Information Retrieval Facility & Matrixware (AT)  See www.ir-facility.org/the_irf/current-projects/clef-ip09-track/

Log File Analysis (LogCLEF) – New this Year

 

LogCLEF deals with the analysis of queries as expression of user behavior. The goal is the analysis and classification of queries in order to improve search systems. LogCLEF has two tasks:

 

The coordinators are: U. Hildesheim (DE), U. Padua (IT), Mitre Corp. (US). See http://www.uni-hildesheim.de/logclef/

 

CLEF 2009 also offers a new pilot task

Grid Experiments (Grid@CLEF)
Multilingual information access (MLIA) is increasingly part of many complex systems, such as digital libraries, enterprise portals, Web search engines. But do we really know how MLIA components (stemmers, IR models, relevance feedback, translation techniques, etc.) behave with respect to languages? Grid@CLEF is launching a cooperative effort where a series of large-scale, systematic grid experiments will aim at improving our comprehension of MLIA systems and gaining an exhaustive picture of their behaviour with respect to languages. Participants will be asked to take part in a series of experiments that have been carefully designed to ensure that the tested MLIA components are really comparable over groups and differences come only from the languages and tasks at hand. Task Coordinators are U. Padua (IT) and NIST (US). Details available at http://ims.dei.unipd.it/gridclef/