Finding common features in multilingual fake news: a quantitative clustering approach

IF 1.1 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities Pub Date : 2024-04-03 DOI:10.1093/llc/fqae016

Wei Yuan, Haitao Liu

{"title":"Finding common features in multilingual fake news: a quantitative clustering approach","authors":"Wei Yuan, Haitao Liu","doi":"10.1093/llc/fqae016","DOIUrl":null,"url":null,"abstract":"Since the Internet is a breeding ground for unconfirmed fake news, its automatic detection and clustering studies have become crucial. Most current studies focus on English texts, and the common features of multilingual fake news are not sufficiently studied. Therefore, this article uses English, Russian, and Chinese as examples and focuses on identifying the common quantitative features of fake news in different languages at the word, sentence, readability, and sentiment levels. These features are then utilized in principal component analysis, K-means clustering, hierarchical clustering, and two-step clustering experiments, which achieved satisfactory results. The common features we proposed play a greater role in achieving automatic cross-lingual clustering than the features proposed in previous studies. Simultaneously, we discovered a trend toward linguistic simplification and economy in fake news. Furthermore, fake news is easier to understand and uses negative emotional expressions in ways that real news does not. Our research provides new reference features for fake news detection tasks and facilitates research into their linguistic characteristics.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"4 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqae016","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Since the Internet is a breeding ground for unconfirmed fake news, its automatic detection and clustering studies have become crucial. Most current studies focus on English texts, and the common features of multilingual fake news are not sufficiently studied. Therefore, this article uses English, Russian, and Chinese as examples and focuses on identifying the common quantitative features of fake news in different languages at the word, sentence, readability, and sentiment levels. These features are then utilized in principal component analysis, K-means clustering, hierarchical clustering, and two-step clustering experiments, which achieved satisfactory results. The common features we proposed play a greater role in achieving automatic cross-lingual clustering than the features proposed in previous studies. Simultaneously, we discovered a trend toward linguistic simplification and economy in fake news. Furthermore, fake news is easier to understand and uses negative emotional expressions in ways that real news does not. Our research provides new reference features for fake news detection tasks and facilitates research into their linguistic characteristics.

查看原文本刊更多论文

在多语言假新闻中寻找共同特征：一种定量聚类方法

互联网是未经证实的假新闻的温床，因此对其进行自动检测和聚类研究变得至关重要。目前的研究大多集中在英文文本上，而对多语言假新闻的共同特征研究不足。因此，本文以英文、俄文和中文为例，重点从词、句、可读性和情感层面识别不同语言假新闻的共同量化特征。然后利用这些特征进行主成分分析、K-均值聚类、层次聚类和两步聚类实验，取得了令人满意的结果。与以往研究中提出的特征相比，我们提出的共同特征在实现跨语言自动聚类方面发挥了更大的作用。同时，我们发现假新闻在语言上有简化和经济的趋势。此外，假新闻更容易理解，并且使用了负面情绪表达方式，而真实新闻则没有。我们的研究为假新闻检测任务提供了新的参考特征，并促进了对其语言特点的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Scholarship in the Humanities Multiple-

CiteScore

1.80

自引率

25.00%

发文量

期刊介绍： DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.