西班牙语假新闻检测研究

IF 0.3 Q4 MATHEMATICS, APPLIED

International Journal of Combinatorial Optimization Problems and Informatics Pub Date : 2024-06-12 DOI:10.61467/2007.1558.2024.v15i2.467

Alba Maribel Sánchez Gálvez, Francisco Javier Albores, Ricardo Álvarez González, Said González Conde, Sully Sánchez Gálvez

{"title":"西班牙语假新闻检测研究","authors":"Alba Maribel Sánchez Gálvez, Francisco Javier Albores, Ricardo Álvarez González, Said González Conde, Sully Sánchez Gálvez","doi":"10.61467/2007.1558.2024.v15i2.467","DOIUrl":null,"url":null,"abstract":"False information published with the intention of misleading social media users is known as fake news. These are created to appear as credible and genuine information and can manipulate opinions and be disseminated for political or financial purposes (Kaliyar et al., 2021). Fake news is especially propagated on Twitter, today X due to its great capacity for interaction with users, as well as the possibility of retweeting and commenting, which allows for greater dissemination of information. \nThis study proposes a model for detecting fake news in Spanish, which faces challenges such as linguistic diversity and limited resources available for preprocessing. Using a database of approximately 40,000 news extracted from two acquaintances news accounts in Mexico on Twitter, such as “Reforma” and “El Deforma”, from 2019 to 2024, a model based on Natural Language Processing, Machine Learning, Deep Learning, and transformer models were developed. This model allows distinguishing whether a headline of a news article in Spanish published on Twitter is true or fake. \nThe algorithms used include Logistic Regression, Naïve Bayes, Support Vector Machines, LSTM, Bidirectional LSTM and mBERT and BETO. After comparing their results, the best accuracy of 0.98 was obtained with BETO. Therefore, transformer-based models outperformed the other approaches used in the study in terms of accuracy. This study allowed identifying the words frequently used in the corpus of fake news, concluding that they often use expressions with exaggerated adjectives and words expressing certainty or amazement in a social, political, and entertainment context.","PeriodicalId":42388,"journal":{"name":"International Journal of Combinatorial Optimization Problems and Informatics","volume":"97 9","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A study on the Detection of Fake news in Spanish\",\"authors\":\"Alba Maribel Sánchez Gálvez, Francisco Javier Albores, Ricardo Álvarez González, Said González Conde, Sully Sánchez Gálvez\",\"doi\":\"10.61467/2007.1558.2024.v15i2.467\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"False information published with the intention of misleading social media users is known as fake news. These are created to appear as credible and genuine information and can manipulate opinions and be disseminated for political or financial purposes (Kaliyar et al., 2021). Fake news is especially propagated on Twitter, today X due to its great capacity for interaction with users, as well as the possibility of retweeting and commenting, which allows for greater dissemination of information. \\nThis study proposes a model for detecting fake news in Spanish, which faces challenges such as linguistic diversity and limited resources available for preprocessing. Using a database of approximately 40,000 news extracted from two acquaintances news accounts in Mexico on Twitter, such as “Reforma” and “El Deforma”, from 2019 to 2024, a model based on Natural Language Processing, Machine Learning, Deep Learning, and transformer models were developed. This model allows distinguishing whether a headline of a news article in Spanish published on Twitter is true or fake. \\nThe algorithms used include Logistic Regression, Naïve Bayes, Support Vector Machines, LSTM, Bidirectional LSTM and mBERT and BETO. After comparing their results, the best accuracy of 0.98 was obtained with BETO. Therefore, transformer-based models outperformed the other approaches used in the study in terms of accuracy. This study allowed identifying the words frequently used in the corpus of fake news, concluding that they often use expressions with exaggerated adjectives and words expressing certainty or amazement in a social, political, and entertainment context.\",\"PeriodicalId\":42388,\"journal\":{\"name\":\"International Journal of Combinatorial Optimization Problems and Informatics\",\"volume\":\"97 9\",\"pages\":\"\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Combinatorial Optimization Problems and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.61467/2007.1558.2024.v15i2.467\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Combinatorial Optimization Problems and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61467/2007.1558.2024.v15i2.467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

以误导社交媒体用户为目的发布的虚假信息被称为假新闻。这些虚假信息看起来像是可信的真实信息，可以操纵观点，并出于政治或经济目的进行传播（Kaliyar et al.，2021）。如今，假新闻在推特上的传播尤为猖獗，这是因为推特具有与用户互动的巨大能力，以及转发和评论的可能性，从而使信息得到更广泛的传播。本研究提出了一个检测西班牙语假新闻的模型，该模型面临语言多样性和预处理资源有限等挑战。利用从墨西哥 Twitter 上的两个熟人新闻账户（如 "Reforma "和 "El Deforma"）中提取的 2019 年至 2024 年约 40,000 条新闻的数据库，开发了一个基于自然语言处理、机器学习、深度学习和转换器模型的模型。该模型可以区分 Twitter 上发布的西班牙语新闻文章标题是真是假。使用的算法包括逻辑回归、奈夫贝叶斯、支持向量机、LSTM、双向 LSTM 以及 mBERT 和 BETO。经过比较，BETO 的准确率最高，达到 0.98。因此，基于变压器的模型在准确性方面优于研究中使用的其他方法。这项研究有助于识别假新闻语料库中的常用词，并得出结论：在社会、政治和娱乐语境中，假新闻经常使用带有夸张形容词的表达方式以及表示确定或惊讶的词语。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A study on the Detection of Fake news in Spanish

False information published with the intention of misleading social media users is known as fake news. These are created to appear as credible and genuine information and can manipulate opinions and be disseminated for political or financial purposes (Kaliyar et al., 2021). Fake news is especially propagated on Twitter, today X due to its great capacity for interaction with users, as well as the possibility of retweeting and commenting, which allows for greater dissemination of information. This study proposes a model for detecting fake news in Spanish, which faces challenges such as linguistic diversity and limited resources available for preprocessing. Using a database of approximately 40,000 news extracted from two acquaintances news accounts in Mexico on Twitter, such as “Reforma” and “El Deforma”, from 2019 to 2024, a model based on Natural Language Processing, Machine Learning, Deep Learning, and transformer models were developed. This model allows distinguishing whether a headline of a news article in Spanish published on Twitter is true or fake. The algorithms used include Logistic Regression, Naïve Bayes, Support Vector Machines, LSTM, Bidirectional LSTM and mBERT and BETO. After comparing their results, the best accuracy of 0.98 was obtained with BETO. Therefore, transformer-based models outperformed the other approaches used in the study in terms of accuracy. This study allowed identifying the words frequently used in the corpus of fake news, concluding that they often use expressions with exaggerated adjectives and words expressing certainty or amazement in a social, political, and entertainment context.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Combinatorial Optimization Problems and Informatics MATHEMATICS, APPLIED-

自引率

0.00%

发文量