{"title":"Comparing Results of Multiple Machine Learning Algorithms on a bilingual dataset for the Detection of Fraudulent News","authors":"Amogh Jalan, Aniket Gupta, P. Meel","doi":"10.1109/MECO58584.2023.10154918","DOIUrl":null,"url":null,"abstract":"In today's world, it is pivotal to have to spot fake information as soon as it appears. Due to the vast and quick dissemination of news on the Internet, this is particularly crucial. Equally important is the capacity to determine if an article of news is accurate or false based on its headline. In this paper, we create a multi-lingual dataset and compare various algorithms on it. The outcome will be contrasted with the identification based on the entire text. The purpose of this is to put forth a technique for predicting fake news that strikes a balance between the quantity and quality of data analysis. A large number of studies on automatic fake news identification rely solely on English-language information, with only a few studies evaluating other language groups or contrasting several language features. This research examines textual characteristics that are not restricted to a specific language in the context of describing textual data for news discovery, as the widespread dissemination of false information is a prevalent global problem. To investigate text complexity, stylometric, and psychological aspects, the vocabulary of news articles published in English(American) and Hindi was examined. The traits that were retrieved help in the identification of real and fraudulent news. To create the detection model, we analyzed the performance of four ML algorithms: Multinomial Naive Bayes, Logistic Regression, Bernoulli Naive Bayes, and Bidirectional LSTM. With Logistic Regression and Bernoulli Naive Bayes an average accuracy of 86% was achieved, the results demonstrate that our suggested language-unrelated showcases are effective in classifying untrue and real news between two separate languages.","PeriodicalId":187825,"journal":{"name":"2023 12th Mediterranean Conference on Embedded Computing (MECO)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 12th Mediterranean Conference on Embedded Computing (MECO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MECO58584.2023.10154918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In today's world, it is pivotal to have to spot fake information as soon as it appears. Due to the vast and quick dissemination of news on the Internet, this is particularly crucial. Equally important is the capacity to determine if an article of news is accurate or false based on its headline. In this paper, we create a multi-lingual dataset and compare various algorithms on it. The outcome will be contrasted with the identification based on the entire text. The purpose of this is to put forth a technique for predicting fake news that strikes a balance between the quantity and quality of data analysis. A large number of studies on automatic fake news identification rely solely on English-language information, with only a few studies evaluating other language groups or contrasting several language features. This research examines textual characteristics that are not restricted to a specific language in the context of describing textual data for news discovery, as the widespread dissemination of false information is a prevalent global problem. To investigate text complexity, stylometric, and psychological aspects, the vocabulary of news articles published in English(American) and Hindi was examined. The traits that were retrieved help in the identification of real and fraudulent news. To create the detection model, we analyzed the performance of four ML algorithms: Multinomial Naive Bayes, Logistic Regression, Bernoulli Naive Bayes, and Bidirectional LSTM. With Logistic Regression and Bernoulli Naive Bayes an average accuracy of 86% was achieved, the results demonstrate that our suggested language-unrelated showcases are effective in classifying untrue and real news between two separate languages.