Welcome to Cross Language Evaluation Forum

CLEF 2008 |

CLEF 2008 - AGENDA

There were 7 main evaluation tracks in 2008

Multilingual Document Retrieval (Ad-Hoc)

This track tested mono- and cross-language text retrieval. We offered a totally new main task for monolingual and cross-language search on library catalogue records. The task was organised in collaboration with The European Library (TEL) and searching was on collections derived from the TEL archives in English, French, and German. We also offered more traditional mono- and bilingual ad-hoc retrieval tasks on a very exciting Persian newspaper corpus: the Hanmshari collection. The "robust" task was offered this year using a word sense disambiguated collection of news documents in English and offering mono- and bilingual tasks. Hard topics from previous years were chosen to give the possibility of developing advanced techniques to deal with them. The track was coordinated by ISTI-CNR (IT), U.Padua (IT), U.Tehran (IR), U.Hildesheim (DE) and U. Basque Country (ES). See here for more information and also http://ixa2.si.ehu.es/clirwsd.

Scientific Data Retrieval (Domain-Specific)

Mono- and cross-language domain specific retrieval on structured bibliographic data for the social sciences was studied. The following corpora were provided: GIRT-4 for German/ English, CSA Sociological Abstracts for English, ISISS for Russian. Multilingual controlled vocabularies (English, German, Russian) and bi-directional mappings between terminologies were available. Topics were offered in English, German and Russian. The track was coordinated by GESIS-IZ Bonn. See http://www.gesis.org/en/research/information_technology/clef_ds.htm

Interactive Cross-Language Retrieval (iCLEF)

Interactive retrieval of images using the popular image perusal community Flickr (www.flickr.com) as a target database was studied. Flickr is a dynamic, rapidly changing database of images with textual comments provided by creators and viewers in a self-organizing ontology of tags (a so-called "folksonomy"). This labeling activity is naturally multilingual, reactive, and cooperative. In 2008, the focus was on measuring relevance, user confidence/satisfaction and user behaviour on a larger scale than previous years; to serve this purpose, a single multilingual interface to Flickr was used by all participants. The track was coordinated by UNED, Madrid, (ES). See http://nlp.uned.es/iCLEF for details.

Multiple Language Question Answering (QA@CLEF).

Both main tasks (QA) and exercises (AVE, QAST, QA-WSD) were proposed. The main task scenario was event-targeted QA on a heterogeneous document collection (news articles and Wikipedia). Many monolingual and cross-language sub-tasks were offered: Basque, Bulgarian, Dutch, English, French, German, Italian, Portuguese, Romanian and Spanish were proposed as both query and target languages. The additional exercises were the following:

The Answer Validation Exercise (AVE) in its third edition was aimed at evaluating answer validation systems based on Recognizing Textual Entailment. See http://nlp.uned.es/clef-qa/ave/
QAST was focused on Question Answering over speech transcriptions of seminars. in this 2nd year pilot task, answers to factual and definitional questions in English were to be extracted from spontaneous speech transcriptions related to separate scenarios in English, French and Spanish. For more details and access to collection see http://www.lsi.upc.edu/~qast/
QA-WSD provided the questions and collections with already disambiguated Word Senses in order to study their contribution to QA performance. See http://ixa2.si.ehu.es/qawsd/

The track was organized by several institutions (one for each source language) and jointly coordinated by CELCT, Trento (IT) and UNED, Madrid (ES). See http://clef-qa.itc.it/ and http://nlp.uned.es/clef-qa/

Cross-Language Image Retrieval (ImageCLEF)

This track evaluated retrieval of images from multi-lingual collections; both text and visual retrieval techniques were exploitable. Four challenging tasks were offered: (i) multilingual ad-hoc retrieval (collection with mixed English/German/Spanish annotations, queries in many languages), (ii) medical image retrieval (queries in several languages, with visual, semantic and mixed topics), (iii) hierarchical medical image annotation (fully categorized collection), (iv) topic detection (non-annotated collection, several concepts need to be detected). Visual image retrieval was not required for all tasks and a default visual and textual retrieval system were made available for participants. The track coordinators were U. Sheffield (UK), U. of Geneva (CH), Oregon Health and Science U. (US), Victoria U. Melbourne (AU), RWTH Aachen (DE), Vienna U. of Technology (AT). See also: http://www.imageclef.org/ for a full description of all tasks.

CLEF Web Track (WebCLEF)

In the past three years this track had focused on evaluation of systems providing multi- and cross-lingual access to web data. WebCLEF 2008 repeated the task setup of the 2007 edition. In 2008, we evaluated a multilingual information synthesis task, where, for a given topic, participating systems were asked to extract important snippets from web pages (fetched from the live web and provided by the task organizers). The systems had to focus on extracting, summarizing, filtering and presenting information relevant to the topic, rather than on large scale web search and retrieval per se. In the 2008 edition of the task, we focused on refining the assessment procedure and evaluation measures. WebCLEF 2008 had lots of similarities with (topic-oriented) multi-document summarization and with answering complex questions. An important difference was that at WebCLEF, topics could come with extensive descriptions and with many thousands of documents from which important facts had to be mined. In addition, WebCLEF worked with web documents, that could be very noisy and redundant.

For WebCLEF 2008 participants, the test collection, topics, assessments, and source code of the best performing 2007 system were available, together with evaluation scripts, so as to create a shared baseline.

Tentative timeline for WebCLEF 2008:

February 2008: task guidelines and development data available
April 2008: topic development (shared effort by participants)
May 2008: release of test collection and topic
June 2008: result submission
July 2008: evaluation (shared effort by participants)
August 2008: paper submission for CLEF Working Notes
September 2008: CLEF workshop

WebCLEF 2008 was organized by the University of Amsterdam and backed by the following advisory board:
Julio Gonzalo, Universidad Nacional de Educacion a Distancia
Valentin Jijkoun, University of Amsterdam
Jimmy Lin, University of Maryland>
Constantin Orasan, University of Wolverhampton
Maarten de Rijke, University of Amsterdam

For more details, tools, resources, and signing up please visit http://ilps.science.uva.nl/WebCLEF/WebCLEF2008/

Cross-Language Geographical Information Retrieval (GeoCLEF)

Multilingual document retrieval with an emphasis on geographic search (GIR) was evaluated. The 2008 GeoCLEF track consisted of several parts: A modification of the existing GeoCLEF search task on newspaper collections. Some topics simulated the situation of a user who posed a query when looking at a map on the screen. For these topics, the system received the content part and a rectangular shape which defined the geographic context. The query parsing task: This sub-task was run by Microsoft Research Asia who supplied a logfile containing 800,000 queries. The participants were required to identify the geographic aspects of the query. Two pilot tasks offered new search challenges:

Wikipedia task: Multilingual search task with German, Portuguese and English.
Image search task: This task explored and evaluated systems for geographic image search.

For further information (including details on track coordinators), see http://www.uni-hildesheim.de/geoclef/

CLEF 2008 also offers two new tracks as pilot tasks

Cross-Language Video Retrieval (VideoCLEF)

The Vid2RSS feed task was a classification task performed on a video corpus containing episodes of a dual language television program. Task participants was provided with speech recognition transcripts, metadata and keyframes for the video data. The languages of the television program were Dutch and English. It is important to note that the languages occurred side by side in the program, both contributing to the spoken content of the program and were not translations of the same content. The task was to group videos into topic categories and generated an RSS-feed for each bcategory. The videos were classified (i.e. assigned to the topic categories) using the speech recognition transcripts for both languages spoken in the program. Keyframes and metadata supported the generation of the RSS-feeds, but were also used to support classification, if participants chose. See http://ilps.science.uva.nl/Vid2RSS/.

Multilingual Information Filtering (INFILE@CLEF)

Infile (information filtering evaluation) extended the last filtering track of TREC 2002 in the following ways:

It was crosslingual using a corpus of 100,000 Agence France Press (AFP) comparable newswire stories for Arabic, English and French;
Evaluation was performed using an automatic interrogation of test systems with a simulated user feedback.
Each system was able to use the feedback at any time to increase performance. Test systems provided boolean decisions for each document and filter profile. A curve of the evolution of efficiency was computed along with more classical measures tested in TREC. Infile was also open to monolingual participation. The track was coordinated by CEA (FR), U. Lille (FR) , ELDA (FR). See http://www.infile.org/