Alba Maribel Sánchez Gálvez, Francisco Javier Albores, Ricardo Álvarez González, Said González Conde, Sully Sánchez Gálvez
{"title":"A study on the Detection of Fake news in Spanish","authors":"Alba Maribel Sánchez Gálvez, Francisco Javier Albores, Ricardo Álvarez González, Said González Conde, Sully Sánchez Gálvez","doi":"10.61467/2007.1558.2024.v15i2.467","DOIUrl":null,"url":null,"abstract":"False information published with the intention of misleading social media users is known as fake news. These are created to appear as credible and genuine information and can manipulate opinions and be disseminated for political or financial purposes (Kaliyar et al., 2021). Fake news is especially propagated on Twitter, today X due to its great capacity for interaction with users, as well as the possibility of retweeting and commenting, which allows for greater dissemination of information. \nThis study proposes a model for detecting fake news in Spanish, which faces challenges such as linguistic diversity and limited resources available for preprocessing. Using a database of approximately 40,000 news extracted from two acquaintances news accounts in Mexico on Twitter, such as “Reforma” and “El Deforma”, from 2019 to 2024, a model based on Natural Language Processing, Machine Learning, Deep Learning, and transformer models were developed. This model allows distinguishing whether a headline of a news article in Spanish published on Twitter is true or fake. \nThe algorithms used include Logistic Regression, Naïve Bayes, Support Vector Machines, LSTM, Bidirectional LSTM and mBERT and BETO. After comparing their results, the best accuracy of 0.98 was obtained with BETO. Therefore, transformer-based models outperformed the other approaches used in the study in terms of accuracy. This study allowed identifying the words frequently used in the corpus of fake news, concluding that they often use expressions with exaggerated adjectives and words expressing certainty or amazement in a social, political, and entertainment context.","PeriodicalId":42388,"journal":{"name":"International Journal of Combinatorial Optimization Problems and Informatics","volume":"97 9","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Combinatorial Optimization Problems and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61467/2007.1558.2024.v15i2.467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
False information published with the intention of misleading social media users is known as fake news. These are created to appear as credible and genuine information and can manipulate opinions and be disseminated for political or financial purposes (Kaliyar et al., 2021). Fake news is especially propagated on Twitter, today X due to its great capacity for interaction with users, as well as the possibility of retweeting and commenting, which allows for greater dissemination of information.
This study proposes a model for detecting fake news in Spanish, which faces challenges such as linguistic diversity and limited resources available for preprocessing. Using a database of approximately 40,000 news extracted from two acquaintances news accounts in Mexico on Twitter, such as “Reforma” and “El Deforma”, from 2019 to 2024, a model based on Natural Language Processing, Machine Learning, Deep Learning, and transformer models were developed. This model allows distinguishing whether a headline of a news article in Spanish published on Twitter is true or fake.
The algorithms used include Logistic Regression, Naïve Bayes, Support Vector Machines, LSTM, Bidirectional LSTM and mBERT and BETO. After comparing their results, the best accuracy of 0.98 was obtained with BETO. Therefore, transformer-based models outperformed the other approaches used in the study in terms of accuracy. This study allowed identifying the words frequently used in the corpus of fake news, concluding that they often use expressions with exaggerated adjectives and words expressing certainty or amazement in a social, political, and entertainment context.