信息检索中的可信度

IF 12.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval Pub Date : 2015-11-18 DOI:10.1561/1500000046

A. Gînsca, Adrian Daniel Popescu, M. Lupu

{"title":"信息检索中的可信度","authors":"A. Gînsca, Adrian Daniel Popescu, M. Lupu","doi":"10.1561/1500000046","DOIUrl":null,"url":null,"abstract":"Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under \"credibility\": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and the problems at hand, rather than a final answer to the question of what credibility is for computer science. Even within the relatively limited scope of an exact science, such an answer is not possible for a concept that is itself widely debated in philosophy and social sciences.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"62 1","pages":"355-475"},"PeriodicalIF":12.9000,"publicationDate":"2015-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"Credibility in Information Retrieval\",\"authors\":\"A. Gînsca, Adrian Daniel Popescu, M. Lupu\",\"doi\":\"10.1561/1500000046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under \\\"credibility\\\": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and the problems at hand, rather than a final answer to the question of what credibility is for computer science. Even within the relatively limited scope of an exact science, such an answer is not possible for a concept that is itself widely debated in philosophy and social sciences.\",\"PeriodicalId\":48829,\"journal\":{\"name\":\"Foundations and Trends in Information Retrieval\",\"volume\":\"62 1\",\"pages\":\"355-475\"},\"PeriodicalIF\":12.9000,\"publicationDate\":\"2015-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Foundations and Trends in Information Retrieval\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1561/1500000046\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Information Retrieval","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1561/1500000046","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 39

摘要

可信性，作为涵盖可信性和专业知识，也包括质量和可靠性的一般概念，在哲学、心理学和社会学中都有激烈的争论，因此在计算机科学中采用它充满了困难。然而，由于两个互补的因素，它在信息获取社区的重要性越来越大:一方面，精确地指出一条信息的来源相对困难，另一方面，复杂的算法，统计机器学习，人工智能，代表用户做出决策，几乎没有用户自己的监督。本调查对来自不同信息寻求研究领域的现有可信度模型进行了详细分析，重点关注网络及其无处不在的社会成分。它表明，有非常丰富的工作涉及可信度的不同方面和解释，特别是不同类型的文本内容，如网站、博客、推文，但也涉及其他形式的视频、图像、音频和主题，如保健。在介绍了将可信度置于其他科学的背景下并将其与信任联系起来之后，我们主张可信度的四次分解:专业知识和可信度，在文献中有充分记录，主要与信息来源有关，质量和可靠性，提升到平等伙伴的地位，因为来源通常不可能检测到，主要与内容有关。调查的后半部分为读者提供了文献的访问点，按研究兴趣分组。第3节综述了一般研究方向:影响信息消费者可信度评估的因素;用于组合这些因素的模型;预测可信度的方法。一个较小的部分专门用于告知用户从数据中获得的可信度。第4、5和6节进一步详细介绍了特定领域的可信度、社交媒体可信度和多媒体可信度。虽然在第1节和第2节的上下文中可以最好地理解它们，但它们可以相互独立地阅读。本调查的最后一部分涉及一个通常不被认为是“可信度”的主题:独立于数据创建者的系统本身的可信度。这是一个特别重要的主题，在用户有专业动机和不关心数据可信度的领域，如电子发现和专利检索。虽然在这个方向上很少有明确的工作，但我们认为这是一个值得未来探索的开放研究方向。最后，作为对读者的额外帮助，附录列出了专门针对可信度某些方面的现有测试集合。总的来说，这篇综述将为读者提供一个有组织和全面的参考指南，以了解当前的技术状况和手头的问题，而不是对计算机科学的可信度是什么这个问题的最终答案。即使在相对有限的精确科学范围内，对于一个本身在哲学和社会科学中广泛争论的概念，这样的答案也是不可能的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Credibility in Information Retrieval

Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under "credibility": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and the problems at hand, rather than a final answer to the question of what credibility is for computer science. Even within the relatively limited scope of an exact science, such an answer is not possible for a concept that is itself widely debated in philosophy and social sciences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Foundations and Trends in Information Retrieval COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

39.10

自引率

0.00%

发文量

期刊介绍： The surge in research across all domains in the past decade has resulted in a plethora of new publications, causing an exponential growth in published research. Navigating through this extensive literature and staying current has become a time-consuming challenge. While electronic publishing provides instant access to more articles than ever, discerning the essential ones for a comprehensive understanding of any topic remains an issue. To tackle this, Foundations and Trends® in Information Retrieval - FnTIR - addresses the problem by publishing high-quality survey and tutorial monographs in the field. Each issue of Foundations and Trends® in Information Retrieval - FnT IR features a 50-100 page monograph authored by research leaders, covering tutorial subjects, research retrospectives, and survey papers that provide state-of-the-art reviews within the scope of the journal.