CLEF 2006
| Agenda
CLEF 2006 offered a series of evaluation tracks to test different aspects of information retrieval system development. The aim is to promote research into the design of user-friendly, multilingual, multimodal retrieval systems. Information on the test collections available for each track can be found in the instructions on How to Participate.
The ad-hoc track tested mono- and cross-language textual document retrieval. Similarly to 2005, the 2006 track offered mono- and bilingual tasks on target collections in French, Portuguese, Bulgarian and Hungarian (possibly also Polish). Topics (i.e. statements of information needs from which queries are derived) were prepared in a wide range of European languages. We also offered a bilingual task aimed at encouraging system testing with non-European languages against an English target collection. Topics were supplied in a variety of languages including Amharic, Oromo, Hindi, Telugu and Indonesian.
In addition, a new “robust” task was offered; this task emphasized the importance of stable performance over languages instead of high average performance in mono-, cross-language and multilingual IR. The robust task is essentially an ad-hoc task which makes use of test collections previously developed at CLEF. The evaluation methodology considered the geometric average as well as the mean average precision of all topics. Geometric average has proven to be a stable measure for robustness. Data collections were provided in six languages and a topic set of some 150 topics.In the long term, we are interested in topic difficulty and failure analysis for hard topics. The track was coordinated jointly by ISTI-CNR and U.Padua (Italy) and U.Hildesheim (Germany). For further details, see the Ad-Hoc website.
Domain-specific retrieval was studied using the GIRT-4 German/English social science and economics database (as a pseudo-parallel corpus with identical documents in two languages), and Russian sociology database data from the Russian Social Science Corpus (RSSC), and the ISSIC database. Multilingual controlled vocabularies (German/English, German-Russian, English-Russian) were available. Mono- and cross-language tasks were offered. Topics were available in English, German and Russian. Participants could use the indexing terms in the documents and/or the social science thesaurus provided, not only for translation but also to tune relevance decisions of their systems. The track was coordinated by IZ Bonn (Germany). See the Domain-Specific website for more information.
For CLEF 2006, the interactive track joined forces with the image track to work on a new type of interactive image retrieval task to better capture the interplay between image and the multi-lingual reality of the internet for the public at large. The task was based on the popular image perusal community Flickr (www.flickr.com), a dynamic and rapidly changing database of images with textual comments, captions, and titles in many languages and annotated by image creators and viewers cooperatively in a self-organizing ontology of tags (a so-called “folksonomy”). The track was coordinated by UNED (Spain), U. Sheffield (UK) and SICS (Sweden). See the iCLEF website.
Multiple Language Question Answering
This track,
which has received increasing interest at CLEF since 2003, evaluated both
monolingual (non-English) and cross-language QA systems. Questions are posed
in a source language and answers are searched for in a document collection of
a target language. The languages were Bulgarian, Dutch, English, French, German, Italian, Portuguese and Spanish. All combinations
between them will be explored. The main task evaluated open domain QA systems which find
exact answers for factoid and definition questions; in addition, a pilot task evaluated cross-language QA systems in a real, user-oriented
scenario.
There was also
a pilot task that assessed question answering using Wikipedia, the online
encyclopedia, and an Answer Validation Exercise. The track was organised by
several Institutions (one for each language) and coordinated by ITC-irst and
CELCT, Trento (Italy). Information for participants was available at the
QA@CLEF website.
This track evaluated retrieval of images described by text captions based on queries in a different language; both text and image matching techniques were potentially exploitable. Five tasks were offered in 2006:
bilingual ad hoc retrieval (collection in English, queries in a range of languages)
interactive cross-language image retrieval task
medical image retrieval (collection with casenotes in English, French and German, queries derived from short text plus image(s) (visual, mixed and semantic queries)
an automatic image annotation task for medical images (fully categorized collection, categories available in English and German)
an annotation task for non-medical images (new this year)
The tasks offer
ed different and challenging retrieval problems for cross-language image retrieval. The first task was also envisaged as an entry level task for newcomers to CLEF and to CLIR. Image analysis was not required for all tasks and a default visual image retrieval system was made available for participants as well as results from a basic text retrieval system. Four test collections was made available.The track coordinators are University of Sheffield (UK) and the University and U.Hospitals of Geneva (Switzerland), Oregon Health and Science University (USA), Victoria University, Melbourne (Australia), RWTH Aachen University (Germany), and Vienna University of Technology (Austria) collaborate in the task organisation. For more information see the ImageCLEF2006 flyer and the ImageCLEF website.
In 2005, the CL-SR track built a reusable test collection for searching spontaneous conversational English speech using queries in five languages (Czech, English, French, German and Spanish), speech recognition for spoken words, manually and automatically assigned controlled vocabulary descriptors for concepts, dates and locations, manually assigned person names, and hand-written segment summaries. The 2006 CL-SR track extended that collection to include additional English speech (about 900 hours), additional resources (word lattices and more accurate speech recognition), and a no-boundary evaluation condition. A second test collection containing at least 500 hours of Czech speech was also created. Multilingual topic sets with 25 topics were created for each language in 2006. The track was coordinated by the University of Maryland (USA) and Dublin City University (Ireland). See for details. See the CL-SR website for more information.