WELFAKE – WORD EMBEDDING OVER LINGUISTIC FEATURES FOR FAKE NEWS DETECTION

Barath M, Sangeethkumar C, Naveen N, Karthickram S, Partha Sarathi P
{"title":"WELFAKE – WORD EMBEDDING OVER LINGUISTIC FEATURES FOR FAKE NEWS DETECTION","authors":"Barath M, Sangeethkumar C, Naveen N, Karthickram S, Partha Sarathi P","doi":"10.46647/ijetms.2022.v06i06.080","DOIUrl":null,"url":null,"abstract":"News is the only mode and set of information that helps the public to know what's happening everyday globally. We have started our path of reading news digitally, by which many \"Fake news\" are being circulated. Fake news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.\nPeople unknowingly believe those fake news as original one without any analysis or study. Since the machine cannot read the words we use, we are going to use “ML model” to train our dataset to the machine. Our project is a two-phase benchmark model named WELFake based on word embedding where each and all words are converted into numerical values which is further processed to classify\nbased on certain matching property using machine learning. The first phase preprocesses the data set and validates the veracity of news content by using linguistic features. The second phase merges the linguistic feature sets with WE(Word Embedding) and applies voting classification. The classification is based on words and meaning matching and this matching percentage should be above a threshold\nvalue we fix. In this paper we are going to discuss about choosing the best algorithm based on our needs and accuracy and complete the task successfully","PeriodicalId":202831,"journal":{"name":"international journal of engineering technology and management sciences","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"international journal of engineering technology and management sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46647/ijetms.2022.v06i06.080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

News is the only mode and set of information that helps the public to know what's happening everyday globally. We have started our path of reading news digitally, by which many "Fake news" are being circulated. Fake news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue. People unknowingly believe those fake news as original one without any analysis or study. Since the machine cannot read the words we use, we are going to use “ML model” to train our dataset to the machine. Our project is a two-phase benchmark model named WELFake based on word embedding where each and all words are converted into numerical values which is further processed to classify based on certain matching property using machine learning. The first phase preprocesses the data set and validates the veracity of news content by using linguistic features. The second phase merges the linguistic feature sets with WE(Word Embedding) and applies voting classification. The classification is based on words and meaning matching and this matching percentage should be above a threshold value we fix. In this paper we are going to discuss about choosing the best algorithm based on our needs and accuracy and complete the task successfully
Welfake -基于语言特征的词嵌入假新闻检测
新闻是帮助公众了解全球每天发生的事情的唯一方式和一套信息。我们已经开始了数字化阅读新闻的道路,许多“假新闻”正在通过这种方式传播。假新闻是指以新闻形式呈现的虚假或误导性信息。假新闻的目的往往是破坏个人或实体的声誉,或通过广告收入赚钱。人们在没有任何分析和研究的情况下,不知不觉地把这些假新闻当成了原创新闻。由于机器无法读取我们使用的单词,我们将使用“ML模型”来训练我们的数据集给机器。我们的项目是一个名为WELFake的基于词嵌入的两阶段基准模型,其中每个词和所有词都被转换成数值,然后使用机器学习根据一定的匹配属性进一步处理分类。第一阶段对数据集进行预处理,利用语言特征验证新闻内容的真实性。第二阶段将语言特征集与WE(Word Embedding)相融合,并进行投票分类。分类是基于单词和含义匹配的,这个匹配百分比应该高于我们确定的阈值。在本文中,我们将讨论如何根据我们的需求和精度选择最佳算法并成功完成任务
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信