基于SetFit框架的低资源语言假新闻检测

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence Pub Date : 2023-09-20 DOI:10.4114/intartif.vol26iss72pp178-201

Amin Abdedaiem, Abdelhalim Hafedh Dahou, Mohamed Amine Cheragui

{"title":"基于SetFit框架的低资源语言假新闻检测","authors":"Amin Abdedaiem, Abdelhalim Hafedh Dahou, Mohamed Amine Cheragui","doi":"10.4114/intartif.vol26iss72pp178-201","DOIUrl":null,"url":null,"abstract":"Social media has become an integral part of people’s lives, resulting in a constant flow of information. However, a concerning trend has emerged with the rapid spread of fake news, attributed to the lack of verification mechanisms. Fake news has far-reaching consequences, influencing public opinion, disrupting democracy, fuelingsocial tensions, and impacting various domains such as health, environment, and the economy. In order to identify fake news with data sparsity, especially with low resources languages such as Arabic and its dialects, we propose a few-shot learning fake news detection model based on sentence transformer fine-tuning, utilizing no crafted prompts and language model with few parameters. The experimental results prove that the proposed method can achieve higher performances with fewer news samples. This approach provided 71% F1 score on the Algerian dialect fake news dataset and 70% F1 score on the Modern Standard Arabic (MSA) version of the same dataset, which proves that the approach can work on the standard Arabic and its dialects. Therefore, the proposed model can identify fake news in several domains concerning the Algerian community such as politics, COVID-19, tourism, e-commerce, sport, accidents, and car prices.","PeriodicalId":43470,"journal":{"name":"Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence","volume":"23 9 1","pages":"0"},"PeriodicalIF":3.7000,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fake News Detection in Low Resource Languages using SetFit Framework\",\"authors\":\"Amin Abdedaiem, Abdelhalim Hafedh Dahou, Mohamed Amine Cheragui\",\"doi\":\"10.4114/intartif.vol26iss72pp178-201\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media has become an integral part of people’s lives, resulting in a constant flow of information. However, a concerning trend has emerged with the rapid spread of fake news, attributed to the lack of verification mechanisms. Fake news has far-reaching consequences, influencing public opinion, disrupting democracy, fuelingsocial tensions, and impacting various domains such as health, environment, and the economy. In order to identify fake news with data sparsity, especially with low resources languages such as Arabic and its dialects, we propose a few-shot learning fake news detection model based on sentence transformer fine-tuning, utilizing no crafted prompts and language model with few parameters. The experimental results prove that the proposed method can achieve higher performances with fewer news samples. This approach provided 71% F1 score on the Algerian dialect fake news dataset and 70% F1 score on the Modern Standard Arabic (MSA) version of the same dataset, which proves that the approach can work on the standard Arabic and its dialects. Therefore, the proposed model can identify fake news in several domains concerning the Algerian community such as politics, COVID-19, tourism, e-commerce, sport, accidents, and car prices.\",\"PeriodicalId\":43470,\"journal\":{\"name\":\"Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence\",\"volume\":\"23 9 1\",\"pages\":\"0\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2023-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4114/intartif.vol26iss72pp178-201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4114/intartif.vol26iss72pp178-201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

社交媒体已经成为人们生活中不可或缺的一部分，导致信息不断流动。然而，由于缺乏核查机制，随着假新闻的迅速传播，出现了一个令人担忧的趋势。假新闻具有深远的影响，影响公众舆论，破坏民主，加剧社会紧张局势，并影响健康，环境和经济等各个领域。为了对数据稀疏的假新闻进行识别，特别是在阿拉伯语及其方言等资源匮乏的语言中，我们提出了一种基于句子变换微调的少镜头学习假新闻检测模型，该模型采用无手工提示和参数少的语言模型。实验结果表明，该方法可以在较少的新闻样本下获得更高的性能。该方法在阿尔及利亚方言假新闻数据集上提供了71%的F1得分，在同一数据集的现代标准阿拉伯语(MSA)版本上提供了70%的F1得分，这证明该方法可以在标准阿拉伯语及其方言上工作。因此，所提出的模型可以识别涉及阿尔及利亚社区的几个领域的假新闻，如政治、COVID-19、旅游、电子商务、体育、事故和汽车价格。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fake News Detection in Low Resource Languages using SetFit Framework

Social media has become an integral part of people’s lives, resulting in a constant flow of information. However, a concerning trend has emerged with the rapid spread of fake news, attributed to the lack of verification mechanisms. Fake news has far-reaching consequences, influencing public opinion, disrupting democracy, fuelingsocial tensions, and impacting various domains such as health, environment, and the economy. In order to identify fake news with data sparsity, especially with low resources languages such as Arabic and its dialects, we propose a few-shot learning fake news detection model based on sentence transformer fine-tuning, utilizing no crafted prompts and language model with few parameters. The experimental results prove that the proposed method can achieve higher performances with fewer news samples. This approach provided 71% F1 score on the Algerian dialect fake news dataset and 70% F1 score on the Modern Standard Arabic (MSA) version of the same dataset, which proves that the approach can work on the standard Arabic and its dialects. Therefore, the proposed model can identify fake news in several domains concerning the Algerian community such as politics, COVID-19, tourism, e-commerce, sport, accidents, and car prices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

2.00

自引率

0.00%

发文量

审稿时长

8 weeks

期刊介绍： Inteligencia Artificial is a quarterly journal promoted and sponsored by the Spanish Association for Artificial Intelligence. The journal publishes high-quality original research papers reporting theoretical or applied advances in all branches of Artificial Intelligence. The journal publishes high-quality original research papers reporting theoretical or applied advances in all branches of Artificial Intelligence. Particularly, the Journal welcomes: New approaches, techniques or methods to solve AI problems, which should include demonstrations of effectiveness oor improvement over existing methods. These demonstrations must be reproducible. Integration of different technologies or approaches to solve wide problems or belonging different areas. AI applications, which should describe in detail the problem or the scenario and the proposed solution, emphasizing its novelty and present a evaluation of the AI techniques that are applied. In addition to rapid publication and dissemination of unsolicited contributions, the journal is also committed to producing monographs, surveys or special issues on topics, methods or techniques of special relevance to the AI community. Inteligencia Artificial welcomes submissions written in English, Spaninsh or Portuguese. But at least, a title, summary and keywords in english should be included in each contribution.