Linguistic features based framework for automatic fake news detection

IF 6.7 1区 工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Sonal Garg, Dilip Kumar Sharma
{"title":"Linguistic features based framework for automatic fake news detection","authors":"Sonal Garg,&nbsp;Dilip Kumar Sharma","doi":"10.1016/j.cie.2022.108432","DOIUrl":null,"url":null,"abstract":"<div><p>Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.</p></div>","PeriodicalId":55220,"journal":{"name":"Computers & Industrial Engineering","volume":null,"pages":null},"PeriodicalIF":6.7000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Industrial Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360835222004697","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 13

Abstract

Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.

基于语言特征的假新闻自动检测框架
社交媒体平台现在每天主要用于用户的新闻消费。政治团体利用社交媒体平台,把用户的选票投给自己,从而吸引用户。由于社交媒体上的数据量很大,验证内容的真实性至关重要。需要使用人工智能技术,包括开发嵌入和部署机器学习算法来打击错误信息。本文主要从复杂特征、可读性指标、心理语言特征和语体特征等方面探讨了虚假新闻识别的语言特征。语言模型通过学习新闻内容的属性来帮助计算语言驱动的特征。在这项工作中,我们选择了26个重要的特征,并应用了各种机器学习模型来实现。对于特征提取,使用了术语频率逆文档频率(tf-idf)、计数矢量器(CV)和哈希矢量器(HV)三种不同的技术。然后,我们在不同的训练数据集规模下对这些模型进行测试,以获得每个模型的精度并进行比较。我们在实验中使用了四个现有的数据集。使用路透社数据集,该框架的准确率达到90.8%。Buzzfeed数据集的准确率最高达到90%。Random Political和mc_entire数据集的准确率分别为93.8%和86.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Industrial Engineering
Computers & Industrial Engineering 工程技术-工程:工业
CiteScore
12.70
自引率
12.70%
发文量
794
审稿时长
10.6 months
期刊介绍: Computers & Industrial Engineering (CAIE) is dedicated to researchers, educators, and practitioners in industrial engineering and related fields. Pioneering the integration of computers in research, education, and practice, industrial engineering has evolved to make computers and electronic communication integral to its domain. CAIE publishes original contributions focusing on the development of novel computerized methodologies to address industrial engineering problems. It also highlights the applications of these methodologies to issues within the broader industrial engineering and associated communities. The journal actively encourages submissions that push the boundaries of fundamental theories and concepts in industrial engineering techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信