WELFAKE – WORD EMBEDDING OVER LINGUISTIC FEATURES FOR FAKE NEWS DETECTION

international journal of engineering technology and management sciences Pub Date : 2022-11-28 DOI:10.46647/ijetms.2022.v06i06.080

Barath M, Sangeethkumar C, Naveen N, Karthickram S, Partha Sarathi P

{"title":"WELFAKE – WORD EMBEDDING OVER LINGUISTIC FEATURES FOR FAKE NEWS DETECTION","authors":"Barath M, Sangeethkumar C, Naveen N, Karthickram S, Partha Sarathi P","doi":"10.46647/ijetms.2022.v06i06.080","DOIUrl":null,"url":null,"abstract":"News is the only mode and set of information that helps the public to know what's happening everyday globally. We have started our path of reading news digitally, by which many \"Fake news\" are being circulated. Fake news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.\nPeople unknowingly believe those fake news as original one without any analysis or study. Since the machine cannot read the words we use, we are going to use “ML model” to train our dataset to the machine. Our project is a two-phase benchmark model named WELFake based on word embedding where each and all words are converted into numerical values which is further processed to classify\nbased on certain matching property using machine learning. The first phase preprocesses the data set and validates the veracity of news content by using linguistic features. The second phase merges the linguistic feature sets with WE(Word Embedding) and applies voting classification. The classification is based on words and meaning matching and this matching percentage should be above a threshold\nvalue we fix. In this paper we are going to discuss about choosing the best algorithm based on our needs and accuracy and complete the task successfully","PeriodicalId":202831,"journal":{"name":"international journal of engineering technology and management sciences","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"international journal of engineering technology and management sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46647/ijetms.2022.v06i06.080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

News is the only mode and set of information that helps the public to know what's happening everyday globally. We have started our path of reading news digitally, by which many "Fake news" are being circulated. Fake news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue. People unknowingly believe those fake news as original one without any analysis or study. Since the machine cannot read the words we use, we are going to use “ML model” to train our dataset to the machine. Our project is a two-phase benchmark model named WELFake based on word embedding where each and all words are converted into numerical values which is further processed to classify based on certain matching property using machine learning. The first phase preprocesses the data set and validates the veracity of news content by using linguistic features. The second phase merges the linguistic feature sets with WE(Word Embedding) and applies voting classification. The classification is based on words and meaning matching and this matching percentage should be above a threshold value we fix. In this paper we are going to discuss about choosing the best algorithm based on our needs and accuracy and complete the task successfully

查看原文本刊更多论文

Welfake -基于语言特征的词嵌入假新闻检测

新闻是帮助公众了解全球每天发生的事情的唯一方式和一套信息。我们已经开始了数字化阅读新闻的道路，许多“假新闻”正在通过这种方式传播。假新闻是指以新闻形式呈现的虚假或误导性信息。假新闻的目的往往是破坏个人或实体的声誉，或通过广告收入赚钱。人们在没有任何分析和研究的情况下，不知不觉地把这些假新闻当成了原创新闻。由于机器无法读取我们使用的单词，我们将使用“ML模型”来训练我们的数据集给机器。我们的项目是一个名为WELFake的基于词嵌入的两阶段基准模型，其中每个词和所有词都被转换成数值，然后使用机器学习根据一定的匹配属性进一步处理分类。第一阶段对数据集进行预处理，利用语言特征验证新闻内容的真实性。第二阶段将语言特征集与WE(Word Embedding)相融合，并进行投票分类。分类是基于单词和含义匹配的，这个匹配百分比应该高于我们确定的阈值。在本文中，我们将讨论如何根据我们的需求和精度选择最佳算法并成功完成任务

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

international journal of engineering technology and management sciences

自引率

0.00%

发文量