Understanding Information Credibility on Twitter

Sujoy Sikdar, Byungkyu Kang, J. O'Donovan, Tobias Höllerer, Sibel Adali
{"title":"Understanding Information Credibility on Twitter","authors":"Sujoy Sikdar, Byungkyu Kang, J. O'Donovan, Tobias Höllerer, Sibel Adali","doi":"10.1109/SocialCom.2013.9","DOIUrl":null,"url":null,"abstract":"Increased popularity of microblogs in recent years brings about a need for better mechanisms to extract credible or otherwise useful information from noisy and large data. While there are a great number of studies that introduce methods to find credible data, there is no accepted credibility benchmark. As a result, it is hard to compare different studies and generalize from their findings. In this paper, we argue for a methodology for making such studies more useful to the research community. First, the underlying ground truth values of credibility must be reliable. The specific constructs used to define credibility must be carefully defined. Secondly, the underlying network context must be quantified and documented. To illustrate these two points, we conduct a unique credibility study of two different data sets on the same topic, but with different network characteristics. We also conduct two different user surveys, and construct two additional indicators of credibility based on retweet behavior. Through a detailed statistical study, we first show that survey based methods can be extremely noisy and results may vary greatly from survey to survey. However, by combining such methods with retweet behavior, we can incorporate two signals that are noisy but uncorrelated, resulting in ground truth measures that can be predicted with high accuracy and are stable across different data sets and survey methods. Newsworthiness of tweets can be a useful frame for specific applications, but it is not necessary for achieving reliable credibility ground truth measurements.","PeriodicalId":129308,"journal":{"name":"2013 International Conference on Social Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SocialCom.2013.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51

Abstract

Increased popularity of microblogs in recent years brings about a need for better mechanisms to extract credible or otherwise useful information from noisy and large data. While there are a great number of studies that introduce methods to find credible data, there is no accepted credibility benchmark. As a result, it is hard to compare different studies and generalize from their findings. In this paper, we argue for a methodology for making such studies more useful to the research community. First, the underlying ground truth values of credibility must be reliable. The specific constructs used to define credibility must be carefully defined. Secondly, the underlying network context must be quantified and documented. To illustrate these two points, we conduct a unique credibility study of two different data sets on the same topic, but with different network characteristics. We also conduct two different user surveys, and construct two additional indicators of credibility based on retweet behavior. Through a detailed statistical study, we first show that survey based methods can be extremely noisy and results may vary greatly from survey to survey. However, by combining such methods with retweet behavior, we can incorporate two signals that are noisy but uncorrelated, resulting in ground truth measures that can be predicted with high accuracy and are stable across different data sets and survey methods. Newsworthiness of tweets can be a useful frame for specific applications, but it is not necessary for achieving reliable credibility ground truth measurements.
理解Twitter上的信息可信度
近年来,微博越来越受欢迎,需要更好的机制从嘈杂的大数据中提取可信或有用的信息。虽然有大量的研究介绍了寻找可信数据的方法,但没有公认的可信度基准。因此,很难比较不同的研究并从他们的发现中得出结论。在本文中,我们提出了一种使此类研究对研究界更有用的方法。首先,可信度的基础真理值必须是可靠的。必须仔细定义用于定义可信度的具体结构。其次,必须对潜在的网络环境进行量化和记录。为了说明这两点,我们对同一主题的两个不同数据集进行了独特的可信度研究,但具有不同的网络特征。我们还进行了两次不同的用户调查,并基于转发行为构建了两个额外的可信度指标。通过详细的统计研究,我们首先表明,基于调查的方法可能非常嘈杂,结果可能因调查而有很大差异。然而,通过将这些方法与转发行为相结合,我们可以将两个有噪声但不相关的信号结合起来,从而得到可以高精度预测的地面真值测量,并且在不同的数据集和调查方法中都是稳定的。推文的新闻价值可以成为特定应用的有用框架,但它不是实现可靠可信度的必要条件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信