Foundations and Trends in Information Retrieval最新文献_第6页

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2011-01-09 DOI: 10.1561/1500000021

C. Castillo, Brian D. Davison

{"title":"Adversarial Web Search","authors":"C. Castillo, Brian D. Davison","doi":"10.1561/1500000021","DOIUrl":"https://doi.org/10.1561/1500000021","url":null,"abstract":"Web search engines have become indispensable tools for finding content. As the popularity of the Web has increased, the efforts to exploit the Web for commercial, social, or political advantage have grown, making it harder for search engines to discriminate between truthful signals of content quality and deceptive attempts to game search engines' rankings. This problem is further complicated by the open nature of the Web, which allows anyone to write and publish anything, and by the fact that search engines must analyze ever-growing numbers of Web pages. Moreover, increasing expectations of users, who over time rely on Web search for information needs related to more aspects of their lives, further deepen the need for search engines to develop effective counter-measures against deception. \u0000 \u0000In this monograph, we consider the effects of the adversarial relationship between search systems and those who wish to manipulate them, a field known as \"Adversarial Information Retrieval\". We show that search engine spammers create false content and misleading links to lure unsuspecting visitors to pages filled with advertisements or malware. We also examine work over the past decade or so that aims to discover such spamming activities to get spam pages removed or their effect on the quality of the results reduced. \u0000 \u0000Research in Adversarial Information Retrieval has been evolving over time, and currently continues both in traditional areas (e.g., link spam) and newer areas, such as click fraud and spam in social media, demonstrating that this conflict is far from over.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"1 1","pages":"377-486"},"PeriodicalIF":10.4,"publicationDate":"2011-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80485931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 116

Automatic Summarization 自动摘要

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2011-01-01 DOI: 10.1561/1500000015

A. Nenkova, S. Maskey, Yang Liu

{"title":"Automatic Summarization","authors":"A. Nenkova, S. Maskey, Yang Liu","doi":"10.1561/1500000015","DOIUrl":"https://doi.org/10.1561/1500000015","url":null,"abstract":"It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field. We would like to thank the anonymous reviewers, our students and Noemie Elhadad, Hongyan Jing, Julia Hirschberg, Annie Louis, Smaranda Muresan and Dragomir Radev for their helpful feedback. This paper was supported in part by the U.S. National Science Foundation (NSF) under IIS-05-34871 and CAREER 09-53445. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. Full text available at: http://dx.doi.org/10.1561/1500000015","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"32 1","pages":"103-233"},"PeriodicalIF":10.4,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78665747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 427

Test Collection Based Evaluation of Information Retrieval Systems 基于测试集合的信息检索系统评价

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2010-06-03 DOI: 10.1561/1500000009

M. Sanderson

引用次数: 399

Web Crawling Web爬行

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2010-03-01 DOI: 10.1561/1500000017

Christopher Olston, Marc Najork

引用次数: 2

Mining Query Logs: Turning Search Usage Data into Knowledge 挖掘查询日志:将搜索使用数据转化为知识

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2010-01-01 DOI: 10.1561/1500000013

F. Silvestri

引用次数: 200

Concept-Based Video Retrieval 基于概念的视频检索

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2009-05-26 DOI: 10.1561/1500000014

Cees G. M. Snoek, M. Worring

{"title":"Concept-Based Video Retrieval","authors":"Cees G. M. Snoek, M. Worring","doi":"10.1561/1500000014","DOIUrl":"https://doi.org/10.1561/1500000014","url":null,"abstract":"In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human–computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"10 1","pages":"215-322"},"PeriodicalIF":10.4,"publicationDate":"2009-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81589311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 429

Methods for Evaluating Interactive Information Retrieval Systems with Users 具有用户的交互式信息检索系统评价方法

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2009-04-28 DOI: 10.1561/1500000012

D. Kelly

引用次数: 621

The Probabilistic Relevance Framework: BM25 and Beyond 概率关联框架:BM25及以后

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2009-04-01 DOI: 10.1561/1500000019

S. Robertson, H. Zaragoza

引用次数: 2328

Opinion Mining and Sentiment Analysis 意见挖掘和情感分析

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2008-07-08 DOI: 10.1561/1500000011

B. Pang, Lillian Lee

{"title":"Opinion Mining and Sentiment Analysis","authors":"B. Pang, Lillian Lee","doi":"10.1561/1500000011","DOIUrl":"https://doi.org/10.1561/1500000011","url":null,"abstract":"An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. \u0000 \u0000This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"2017 18","pages":"1-135"},"PeriodicalIF":10.4,"publicationDate":"2008-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1561/1500000011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72400159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4579

Email Spam Filtering: A Systematic Review 垃圾邮件过滤:一个系统的审查

IF 10.4 2区计算机科学

Foundations and Trends in Information Retrieval Pub Date : 2008-06-23 DOI: 10.1561/1500000006

G. Cormack

{"title":"Email Spam Filtering: A Systematic Review","authors":"G. Cormack","doi":"10.1561/1500000006","DOIUrl":"https://doi.org/10.1561/1500000006","url":null,"abstract":"Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email user has a working definition no more formal than \"I know it when I see it.\" Yet, current spam filters are remarkably effective, more effective than might be expected given the level of uncertainty and debate over a formal definition of spam, more effective than might be expected given the state-of-the-art information retrieval and machine learning methods for seemingly similar problems. But are they effective enough? Which are better? How might they be improved? Will their effectiveness be compromised by more cleverly crafted spam? \u0000 \u0000We survey current and proposed spam filtering techniques with particular emphasis on how well they work. Our primary focus is spam filtering in email; Similarities and differences with spam filtering in other communication and storage media — such as instant messaging and the Web — are addressed peripherally. In doing so we examine the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe. Well-known methods are detailed sufficiently to make the exposition self-contained, however, the focus is on considerations unique to spam. Comparisons, wherever possible, use common evaluation measures, and control for differences in experimental setup. Such comparisons are not easy, as benchmarks, measures, and methods for evaluating spam filters are still evolving. We survey these efforts, their results and their limitations. In spite of recent advances in evaluation methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain as to the effectiveness of spam filtering techniques and as to the validity of spam filter evaluation methods. We outline several uncertainties and propose experimental methods to address them.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"5 1","pages":"335-455"},"PeriodicalIF":10.4,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76027829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 296