{"title":"On the science of search: statistical approaches, evaluation, optimisation","authors":"S. Robertson","doi":"10.1145/1364742.1364745","DOIUrl":"https://doi.org/10.1145/1364742.1364745","url":null,"abstract":"This paper, based on a talk, presents an overview of evaluation experiments in information retrieval, and also of statistical approaches to search. A strong connection exists between them: the notion that the objective of search can be expressed in terms of the measures used for evaluation informs the statistical theory in several ways. The latest manifestation of this connection is the work on optimization of ranking algorithms, using machine learning techniques.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115845661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to compose a complex document recognition system","authors":"H. Fujisawa","doi":"10.1145/1364742.1364759","DOIUrl":"https://doi.org/10.1145/1364742.1364759","url":null,"abstract":"The technical challenges in document analysis and recognition have been to solve the problems of uncertainty and variability. From our experiences in developing OCRs, business form readers, and postal address recognition engines, we would like to present design principles to cope with these problems of uncertainty and variability. When the targets of document recognition are complex and diversified, the recognition engine needs to solve many different kinds of pattern recognition problems, which are a reflection of uncertainty and variability. Inevitably, the engine becomes complex, raising a question of how to combine its subcomponents, which are not perfect in their accuracies. The design principles will be explained with examples in postal address recognition.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132289633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding an answer to a question","authors":"Brigitte Grau","doi":"10.1145/1364742.1364751","DOIUrl":"https://doi.org/10.1145/1364742.1364751","url":null,"abstract":"The huge quantity of available electronic information leads to a growing need for users to have tools able to be precise and selective. These kinds of tools have to provide answers to requests quite rapidly without requiring the user to explore each document, to reformulate her request or to seek for the answer inside documents. From that viewpoint, finding an answer consists not only in finding relevant documents but also in extracting relevant parts. This leads us to express the question-answering problem in terms of an information retrieval problem that can be solved using natural language processing (NLP) approaches. In my talk, I will focus on defining what a \"good\" answer is, and how a system can find it.\u0000 A good answer has to give the required piece of information. However, it is not sufficient; it also has both to be presented within its context of interpretation and to be justified in order to give a user means to evaluate if the answer fits her needs and is appropriate.\u0000 One can view searching an answer to a question as a reformulation problem: according to what is asked, find one of the different linguistic expressions of the answer in all candidate sentences. Within this framework, interlingual question-answering can also be seen as another kind of linguistic variation. The answer phrasing can be considered as an affirmative reformulation of the question, partly or totally, which entails the definition of models that match with sentences containing the answer. According to the different approaches, the kinds of model and the matching criteria greatly differ. It can consist in building a structured representation that makes explicit the semantic relations between the concepts of the question and that is compared to a similar representation of sentences. As this approach requires a syntactic parser and a semantic knowledge base, which are not always available in all the languages, systems often apply a less formal approach based on a similarity measure between a passage and the question and answers are extracted from highest scored passages. Similarity involves different criteria: question terms and their linguistic variations in passages, syntactic proximity, answer type. We will see that, in such an approach, justifications can be envisioned by using text themselves, considered as depositories of semantic knowledge. I will focus on the approach the LIR group of LIMSI has taken for its monolingual and bilingual systems.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124011340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information retrieval and digital libraries: lessons of research","authors":"Karen Spärck Jones","doi":"10.1145/1364742.1364743","DOIUrl":"https://doi.org/10.1145/1364742.1364743","url":null,"abstract":"This paper reviews lessons from the history of information retrieval research, with particular emphasis on recent developments. These have demonstrated the value of statistical techniques for retrieval, and have also shown that they have an important, though not exclusive, part to play in other information processing tasks, like question asnwering and summarising. The heterogeneous materials that digital libraries are expected to cover, their scale, and their changing composition, imply that statistical methods, which are general-purpose and very flexible, have significant potential value for the digital libraries of the future.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129187307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open source search and research","authors":"M. Beigbeder, Wray L. Buntine, Wai Gen Yee","doi":"10.1145/1364742.1364748","DOIUrl":"https://doi.org/10.1145/1364742.1364748","url":null,"abstract":"In this paper, we present a review of criteria for the evaluation of open source information retrieval tools and provide an overview of some of those that are more popular. The question of interaction between research and availability of open source search tools is addressed.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129845071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital audiovisual repositories: an introduction","authors":"Richard Wright","doi":"10.1145/1364742.1364753","DOIUrl":"https://doi.org/10.1145/1364742.1364753","url":null,"abstract":"This paper briefly describes the essential aspects of the digital world that audiovisual archives are entering - or being swallowed-up in. The crucial issue is whether archives will sink or swim in this all-digital environment. The core issue is defining - and meeting - the requirements for a secure, sustainable digital repository.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122269774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From CLIR to CLIE: some lessons in NTCIR evaluation","authors":"Hsin-Hsi Chen","doi":"10.1145/1364742.1364762","DOIUrl":"https://doi.org/10.1145/1364742.1364762","url":null,"abstract":"Cross-language information retrieval (CLIR) facilitates the use of one language to access documents in other languages. Cross-language information extraction (CLIE) extracts relevant information in finer granularity from multilingual documents for some specific applications like summarization, question answering, opinion extraction, etc. This paper reviews CLIR, CLQA, and opinion analysis tasks in NTCIR evaluation. The design methodologies and some key technologies are reported.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127034446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shallow syntax analysis in Sanskrit guided by semantic nets constraints","authors":"G. Huet","doi":"10.1145/1364742.1364750","DOIUrl":"https://doi.org/10.1145/1364742.1364750","url":null,"abstract":"We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structured lexical database. It relies on the Zen toolkit for finite state automata and transducers, which provides data structures and algorithms for the modular construction and execution of finite state machines, in a functional framework.\u0000 Some of the layers proceed in bottom-up synthesis mode - for instance, noun and verb morphological modules generate all inflected forms from stems and roots listed in the lexicon. Morphemes are assembled through internal sandhi, and the inflected forms are stored with morphological tags in dictionaries usable for lemmatizing. These dictionaries are then compiled into transducers, implementing the analysis of external sandhi, the phonological process which merges words together by euphony. This provides a tagging segmenter, which analyses a sentence presented as a stream of phonemes and produces a stream of tagged lexical entries, hyperlinked to the lexicon.\u0000 The next layer is a syntax analyser, guided by semantic nets constraints expressing dependencies between the word forms. Finite verb forms demand semantic roles, according to valency patterns depending on the voice (active, passive) of the form and the governance (transitive, etc) of the root. Conversely, noun/adjective forms provide actors which may fill those roles, provided agreement constraints are satisfied. Tool words are mapped to transducers operating on tagged streams, allowing the modeling of linguistic phenomena such as coordination by abstract interpretation of actor streams. The parser ranks the various interpretations (matching actors with roles) with penalties, and returns to the user the minimum penalty analyses, for final validation of ambiguities. The whole platform is organized as a Web service, allowing the piecewise tagging of a Sanskrit text.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131164032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward a common semantics between media and languages","authors":"C. Fluhr, G. Grefenstette, Adrian Daniel Popescu","doi":"10.1145/1364742.1364755","DOIUrl":"https://doi.org/10.1145/1364742.1364755","url":null,"abstract":"For a computer to recognize objects, persons, situations or actions in multimedia, it needs to have learned models of each thing beforehand. For the moment, no large, general collection of training examples exists for the wide variety of things that we would want to automatically recognize in multimedia, video and still images. We believe that the WWW and current technology can allow us to automatically build such a resource. This paper describes a methodology for the construction of a grounded, general purpose, multimedia ontology that is instantiated through web processing. In this hierarchically organized ontology, concepts corresponding to concrete objects, persons, situations and actions are linked with still images, videos and sounds that represent exemplars of each concept. These examples are necessary resources for computing discriminating signatures for the recognition of the concepts in still images or videos. Since images retrieved using existing image search engines contain much noise hand are not always representative, we also present here our methodology for finding good representative for each concept.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125048554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilingual information access: the contribution of evaluation","authors":"C. Peters","doi":"10.1145/1364742.1364761","DOIUrl":"https://doi.org/10.1145/1364742.1364761","url":null,"abstract":"Since evaluation of cross-language information retrieval systems began at TREC in 1997 and NTCIR in 1998 and, in particular, with the launch of the Cross-Language Evaluation Forum (CLEF) in 2000, considerable progress has been made in this particular sector of IR. Advances can be considered in two stages. The first stage regarded in particular the development of text retrieval systems from simple so-called \"bilingual\" systems in which a query in one language is used to search a document collection in another to truly \"multilingual\" retrieval systems where a query in one language can find relevant results from a collection of documents in multiple languages. In the second stage, the focus was no longer just on multilingual document retrieval but was diversified to include different kinds of text retrieval across languages (e.g multilingual question answering) and retrieval on different kinds of media (e.g. collections containing images or speech). However, although the results from the research perspective have been interesting, there has been little real take-up by the applications communities. In the paper we describe the results achieved by CLEF over the years and propose a third stage for multilingual system evaluation which gives far more attention to questions regarding usability and user satisfaction but also provides ways for the results achieved to be transferred to the operational context.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128263844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}