{"title":"Methods for Evaluating Interactive Information Retrieval Systems with Users","authors":"D. Kelly","doi":"10.1561/1500000012","DOIUrl":"https://doi.org/10.1561/1500000012","url":null,"abstract":"This paper provides overview and instruction regarding the evaluation of interactive information retrieval systems with users. The primary goal of this article is to catalog and compile material related to this topic into a single source. This article (1) provides historical background on the development of user-centered approaches to the evaluation of interactive information retrieval systems; (2) describes the major components of interactive information retrieval system evaluation; (3) describes different experimental designs and sampling strategies; (4) presents core instruments and data collection techniques and measures; (5) explains basic data analysis techniques; and (4) reviews and discusses previous studies. This article also discusses validity and reliability issues with respect to both measures and methods, presents background information on research ethics and discusses some ethical issues which are specific to studies of interactive information retrieval (IIR). Finally, this article concludes with a discussion of outstanding challenges and future research directions.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"517 1","pages":"1-224"},"PeriodicalIF":10.4,"publicationDate":"2009-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77147271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Probabilistic Relevance Framework: BM25 and Beyond","authors":"S. Robertson, H. Zaragoza","doi":"10.1561/1500000019","DOIUrl":"https://doi.org/10.1561/1500000019","url":null,"abstract":"The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"35 1","pages":"333-389"},"PeriodicalIF":10.4,"publicationDate":"2009-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86220246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Opinion Mining and Sentiment Analysis","authors":"B. Pang, Lillian Lee","doi":"10.1561/1500000011","DOIUrl":"https://doi.org/10.1561/1500000011","url":null,"abstract":"An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. \u0000 \u0000This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"2017 18","pages":"1-135"},"PeriodicalIF":10.4,"publicationDate":"2008-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1561/1500000011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72400159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Email Spam Filtering: A Systematic Review","authors":"G. Cormack","doi":"10.1561/1500000006","DOIUrl":"https://doi.org/10.1561/1500000006","url":null,"abstract":"Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email user has a working definition no more formal than \"I know it when I see it.\" Yet, current spam filters are remarkably effective, more effective than might be expected given the level of uncertainty and debate over a formal definition of spam, more effective than might be expected given the state-of-the-art information retrieval and machine learning methods for seemingly similar problems. But are they effective enough? Which are better? How might they be improved? Will their effectiveness be compromised by more cleverly crafted spam? \u0000 \u0000We survey current and proposed spam filtering techniques with particular emphasis on how well they work. Our primary focus is spam filtering in email; Similarities and differences with spam filtering in other communication and storage media — such as instant messaging and the Web — are addressed peripherally. In doing so we examine the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe. Well-known methods are detailed sufficiently to make the exposition self-contained, however, the focus is on considerations unique to spam. Comparisons, wherever possible, use common evaluation measures, and control for differences in experimental setup. Such comparisons are not easy, as benchmarks, measures, and methods for evaluating spam filters are still evolving. We survey these efforts, their results and their limitations. In spite of recent advances in evaluation methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain as to the effectiveness of spam filtering techniques and as to the validity of spam filter evaluation methods. We outline several uncertainties and propose experimental methods to address them.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"5 1","pages":"335-455"},"PeriodicalIF":10.4,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76027829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Authorship Attribution","authors":"P. Juola","doi":"10.1561/1500000005","DOIUrl":"https://doi.org/10.1561/1500000005","url":null,"abstract":"Authorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. Recent work in \"non-traditional\" authorship attribution demonstrates the practicality of automatically analyzing documents based on authorial style, but the state of the art is confusing. Analyses are difficult to apply, little is known about type or rate of errors, and few \"best practices\" are available. In part because of this confusion, the field has perhaps had less uptake and general acceptance than is its due. \u0000 \u0000This review surveys the history and present state of the discipline, presenting some comparative results when available. It shows, first, that the discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"23 1","pages":"233-334"},"PeriodicalIF":10.4,"publicationDate":"2008-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79341045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Language Models for Information Retrieval: A Critical Review","authors":"ChengXiang Zhai","doi":"10.1561/1500000008","DOIUrl":"https://doi.org/10.1561/1500000008","url":null,"abstract":"Statistical language models have recently been successfully applied to many information retrieval problems. A great deal of recent work has shown that statistical language models not only lead to superior empirical performance, but also facilitate parameter tuning and open up possibilities for modeling nontraditional retrieval problems. In general, statistical language models provide a principled way of modeling various kinds of retrieval problems. The purpose of this survey is to systematically and critically review the existing work in applying statistical language models to information retrieval, summarize their contributions, and point out outstanding challenges.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"79 1","pages":"137-213"},"PeriodicalIF":10.4,"publicationDate":"2008-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83359588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open-Domain Question-Answering","authors":"J. Prager","doi":"10.1561/1500000001","DOIUrl":"https://doi.org/10.1561/1500000001","url":null,"abstract":"Open-Domain Question Answering is an introduction to the field of Question Answering (QA). It covers the basic principles of QA along with a selection of systems that have exhibited interesting and significant techniques, so it serves more as a tutorial than as an exhaustive survey of the field. Starting with a brief history of the field, it goes on to describe the architecture of a QA system before analysing in detail some of the specific approaches that have been successfully deployed by academia and industry designing and building such systems. Open-Domain Question Answering is both a guide for beginners who are embarking on research in this area, and a useful reference for established researchers and practitioners in this field.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"3232 1","pages":"91-231"},"PeriodicalIF":10.4,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86591447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music Retrieval: A Tutorial and Review","authors":"N. Orio","doi":"10.1561/1500000002","DOIUrl":"https://doi.org/10.1561/1500000002","url":null,"abstract":"The increasing availability of music in digital format needs to be matched by the development of tools for music accessing, filtering, classification, and retrieval. The research area of Music Information Retrieval (MIR) covers many of these aspects. The aim of this paper is to present an overview of this vast and new field. A number of issues, which are peculiar to the music language, are described--including forms, formats, and dimensions of music--together with the typologies of users and their information needs. To fulfil these needs a number of approaches are discussed, from direct search to information filtering and clustering of music documents. An overview of the techniques for music processing, which are commonly exploited in many approaches, is also presented. Evaluation and comparisons of the approaches on a common benchmark are other important issues. To this end, a description of the initial efforts and evaluation campaigns for MIR is provided.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"22 1","pages":"1-90"},"PeriodicalIF":10.4,"publicationDate":"2006-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88032602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}