A Sociolinguistic Route to the Characterization and Detection of the Credibility of Events on Twitter

Proceedings of the 31st ACM Conference on Hypertext and Social Media Pub Date : 2020-07-13 DOI:10.1145/3372923.3404795

Jasabanta Patro, Pushpendra Kumar Singh Rathore

{"title":"A Sociolinguistic Route to the Characterization and Detection of the Credibility of Events on Twitter","authors":"Jasabanta Patro, Pushpendra Kumar Singh Rathore","doi":"10.1145/3372923.3404795","DOIUrl":null,"url":null,"abstract":"Although Twitter constitutes as one of the primary sources of real-time news with users acting as the sensors updating the content from all across the globe, yet the spread of rumours via Twitter is becoming an increasingly alarming issue and is known to have caused significant damage already. We propose a credibility analysis approach based on the linguistic structure of the tweets. We not only characterize the Twitter events but also predict their perceived credibility of them by a novel deep learning architecture. We use the huge CREDBANK data to conduct our experiments. Some of our exciting findings are that standard LIWC categories like 'negate', 'discrep', 'cogmech', 'swear' and the Empath categories like 'hate', 'poor', 'government', 'worship' and 'swearing-terms' correlate negatively with the credibility of events. While some of our results resonate with the earlier literature others represent novel insights of the fake and legitimate twitter events. Using the above observations and the current deep learning architecture we predict the credibility of an event (a four-class classification problem in our case) with an accuracy of 0.54 that improves the best-known state-of-the-art (current accuracy 0.43) by ~ 26%. A fascinating observation is that even by looking at the first few tweets of an event, it is possible to make the prediction almost as accurate as in the case where the entire volume of tweets is observed.","PeriodicalId":389616,"journal":{"name":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM Conference on Hypertext and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3372923.3404795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Although Twitter constitutes as one of the primary sources of real-time news with users acting as the sensors updating the content from all across the globe, yet the spread of rumours via Twitter is becoming an increasingly alarming issue and is known to have caused significant damage already. We propose a credibility analysis approach based on the linguistic structure of the tweets. We not only characterize the Twitter events but also predict their perceived credibility of them by a novel deep learning architecture. We use the huge CREDBANK data to conduct our experiments. Some of our exciting findings are that standard LIWC categories like 'negate', 'discrep', 'cogmech', 'swear' and the Empath categories like 'hate', 'poor', 'government', 'worship' and 'swearing-terms' correlate negatively with the credibility of events. While some of our results resonate with the earlier literature others represent novel insights of the fake and legitimate twitter events. Using the above observations and the current deep learning architecture we predict the credibility of an event (a four-class classification problem in our case) with an accuracy of 0.54 that improves the best-known state-of-the-art (current accuracy 0.43) by ~ 26%. A fascinating observation is that even by looking at the first few tweets of an event, it is possible to make the prediction almost as accurate as in the case where the entire volume of tweets is observed.

查看原文本刊更多论文

推特事件可信度表征与检测的社会语言学路径

尽管Twitter是实时新闻的主要来源之一，用户充当传感器更新来自全球各地的内容，但通过Twitter传播的谣言正在成为一个日益令人担忧的问题，并且已知已经造成了重大损害。我们提出了一种基于推文语言结构的可信度分析方法。我们不仅描述了Twitter事件的特征，还通过一种新颖的深度学习架构预测了它们的感知可信度。我们使用大量的CREDBANK数据来进行实验。我们的一些令人兴奋的发现是，标准的LIWC分类，如“否定”、“浪费”、“作弊”、“咒骂”，以及同理心分类，如“仇恨”、“贫穷”、“政府”、“崇拜”和“咒骂词”，与事件的可信度呈负相关。虽然我们的一些结果与早期的文献产生了共鸣，但另一些结果则代表了对虚假和合法twitter事件的新颖见解。使用上述观察结果和当前的深度学习架构，我们以0.54的精度预测事件的可信度(在我们的案例中是一个四类分类问题)，将最著名的最先进技术(当前精度0.43)提高了约26%。一个有趣的观察结果是，即使只看事件的前几条tweet，也有可能使预测几乎与观察整个tweet量的情况一样准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 31st ACM Conference on Hypertext and Social Media

自引率

0.00%

发文量