双语数据集上多种机器学习算法检测虚假新闻的结果比较

Amogh Jalan, Aniket Gupta, P. Meel
{"title":"双语数据集上多种机器学习算法检测虚假新闻的结果比较","authors":"Amogh Jalan, Aniket Gupta, P. Meel","doi":"10.1109/MECO58584.2023.10154918","DOIUrl":null,"url":null,"abstract":"In today's world, it is pivotal to have to spot fake information as soon as it appears. Due to the vast and quick dissemination of news on the Internet, this is particularly crucial. Equally important is the capacity to determine if an article of news is accurate or false based on its headline. In this paper, we create a multi-lingual dataset and compare various algorithms on it. The outcome will be contrasted with the identification based on the entire text. The purpose of this is to put forth a technique for predicting fake news that strikes a balance between the quantity and quality of data analysis. A large number of studies on automatic fake news identification rely solely on English-language information, with only a few studies evaluating other language groups or contrasting several language features. This research examines textual characteristics that are not restricted to a specific language in the context of describing textual data for news discovery, as the widespread dissemination of false information is a prevalent global problem. To investigate text complexity, stylometric, and psychological aspects, the vocabulary of news articles published in English(American) and Hindi was examined. The traits that were retrieved help in the identification of real and fraudulent news. To create the detection model, we analyzed the performance of four ML algorithms: Multinomial Naive Bayes, Logistic Regression, Bernoulli Naive Bayes, and Bidirectional LSTM. With Logistic Regression and Bernoulli Naive Bayes an average accuracy of 86% was achieved, the results demonstrate that our suggested language-unrelated showcases are effective in classifying untrue and real news between two separate languages.","PeriodicalId":187825,"journal":{"name":"2023 12th Mediterranean Conference on Embedded Computing (MECO)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing Results of Multiple Machine Learning Algorithms on a bilingual dataset for the Detection of Fraudulent News\",\"authors\":\"Amogh Jalan, Aniket Gupta, P. Meel\",\"doi\":\"10.1109/MECO58584.2023.10154918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today's world, it is pivotal to have to spot fake information as soon as it appears. Due to the vast and quick dissemination of news on the Internet, this is particularly crucial. Equally important is the capacity to determine if an article of news is accurate or false based on its headline. In this paper, we create a multi-lingual dataset and compare various algorithms on it. The outcome will be contrasted with the identification based on the entire text. The purpose of this is to put forth a technique for predicting fake news that strikes a balance between the quantity and quality of data analysis. A large number of studies on automatic fake news identification rely solely on English-language information, with only a few studies evaluating other language groups or contrasting several language features. This research examines textual characteristics that are not restricted to a specific language in the context of describing textual data for news discovery, as the widespread dissemination of false information is a prevalent global problem. To investigate text complexity, stylometric, and psychological aspects, the vocabulary of news articles published in English(American) and Hindi was examined. The traits that were retrieved help in the identification of real and fraudulent news. To create the detection model, we analyzed the performance of four ML algorithms: Multinomial Naive Bayes, Logistic Regression, Bernoulli Naive Bayes, and Bidirectional LSTM. With Logistic Regression and Bernoulli Naive Bayes an average accuracy of 86% was achieved, the results demonstrate that our suggested language-unrelated showcases are effective in classifying untrue and real news between two separate languages.\",\"PeriodicalId\":187825,\"journal\":{\"name\":\"2023 12th Mediterranean Conference on Embedded Computing (MECO)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 12th Mediterranean Conference on Embedded Computing (MECO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MECO58584.2023.10154918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 12th Mediterranean Conference on Embedded Computing (MECO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MECO58584.2023.10154918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在当今世界,及时发现虚假信息是至关重要的。由于新闻在互联网上的广泛和快速传播,这一点尤为重要。同样重要的是,根据标题判断一篇新闻是准确还是虚假的能力。在本文中,我们创建了一个多语言数据集,并在其上比较了各种算法。结果将与基于全文的识别进行对比。这样做的目的是提出一种预测假新闻的技术,在数据分析的数量和质量之间取得平衡。大量关于假新闻自动识别的研究仅依赖于英语信息,只有少数研究评估其他语言群体或对比几种语言特征。由于虚假信息的广泛传播是一个普遍的全球性问题,本研究考察了在描述新闻发现的文本数据的背景下,不限于特定语言的文本特征。为了研究文本复杂性、文体特征和心理学方面的问题,我们研究了用英语(美国)和印地语发表的新闻文章的词汇。检索到的特征有助于识别真实和虚假的新闻。为了创建检测模型,我们分析了四种机器学习算法的性能:多项朴素贝叶斯、逻辑回归、伯努利朴素贝叶斯和双向LSTM。通过逻辑回归和伯努利朴素贝叶斯,平均准确率达到86%,结果表明,我们建议的语言无关展示在两种不同语言之间的真实和不真实新闻分类中是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing Results of Multiple Machine Learning Algorithms on a bilingual dataset for the Detection of Fraudulent News
In today's world, it is pivotal to have to spot fake information as soon as it appears. Due to the vast and quick dissemination of news on the Internet, this is particularly crucial. Equally important is the capacity to determine if an article of news is accurate or false based on its headline. In this paper, we create a multi-lingual dataset and compare various algorithms on it. The outcome will be contrasted with the identification based on the entire text. The purpose of this is to put forth a technique for predicting fake news that strikes a balance between the quantity and quality of data analysis. A large number of studies on automatic fake news identification rely solely on English-language information, with only a few studies evaluating other language groups or contrasting several language features. This research examines textual characteristics that are not restricted to a specific language in the context of describing textual data for news discovery, as the widespread dissemination of false information is a prevalent global problem. To investigate text complexity, stylometric, and psychological aspects, the vocabulary of news articles published in English(American) and Hindi was examined. The traits that were retrieved help in the identification of real and fraudulent news. To create the detection model, we analyzed the performance of four ML algorithms: Multinomial Naive Bayes, Logistic Regression, Bernoulli Naive Bayes, and Bidirectional LSTM. With Logistic Regression and Bernoulli Naive Bayes an average accuracy of 86% was achieved, the results demonstrate that our suggested language-unrelated showcases are effective in classifying untrue and real news between two separate languages.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信