利用文本信息进行假新闻检测。

IF 6.6 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
International Journal of Neural Systems Pub Date : 2022-12-01 Epub Date: 2022-11-04 DOI:10.1142/S0129065722500587
Dimitrios Panagiotis Kasseropoulos, Paraskevas Koukaras, Christos Tjortjis
{"title":"利用文本信息进行假新闻检测。","authors":"Dimitrios Panagiotis Kasseropoulos,&nbsp;Paraskevas Koukaras,&nbsp;Christos Tjortjis","doi":"10.1142/S0129065722500587","DOIUrl":null,"url":null,"abstract":"<p><p>\"Fake news\" refers to the deliberate dissemination of news with the purpose to deceive and mislead the public. This paper assesses the accuracy of several Machine Learning (ML) algorithms, using a style-based technique that relies on textual information extracted from news, such as part of speech counts. To expand the already proposed styled-based techniques, a new method of enhancing a linguistic feature set is proposed. It combines Named Entity Recognition (NER) with the Frequent Pattern (FP) Growth association rule mining algorithm, aiming to provide better insight into the papers' sentence level structure. Recursive feature elimination was used to identify a subset of the highest performing linguistic characteristics, which turned out to align with the literature. Using pre-trained word embeddings, document embeddings and weighted document embeddings were constructed using each word's TF-IDF value as the weight factor. The document embeddings were mixed with the linguistic features providing a variety of training/test feature sets. For each model, the best performing feature set was identified and fine-tuned regarding its hyper parameters to improve accuracy. ML algorithms' results were compared with two Neural Networks: Convolutional Neural Network (CNN) and Long-Short-Term Memory (LSTM). The results indicate that CNN outperformed all other methods in terms of accuracy, when companied with pre-trained word embeddings, yet SVM performs almost the same with a wider variety of input feature sets. Although style-based technique scores lower accuracy, it provides explainable results about the author's writing style decisions. Our work points out how new technologies and combinations of existing techniques can enhance the style-based approach capturing more information.</p>","PeriodicalId":50305,"journal":{"name":"International Journal of Neural Systems","volume":null,"pages":null},"PeriodicalIF":6.6000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting Textual Information for Fake News Detection.\",\"authors\":\"Dimitrios Panagiotis Kasseropoulos,&nbsp;Paraskevas Koukaras,&nbsp;Christos Tjortjis\",\"doi\":\"10.1142/S0129065722500587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>\\\"Fake news\\\" refers to the deliberate dissemination of news with the purpose to deceive and mislead the public. This paper assesses the accuracy of several Machine Learning (ML) algorithms, using a style-based technique that relies on textual information extracted from news, such as part of speech counts. To expand the already proposed styled-based techniques, a new method of enhancing a linguistic feature set is proposed. It combines Named Entity Recognition (NER) with the Frequent Pattern (FP) Growth association rule mining algorithm, aiming to provide better insight into the papers' sentence level structure. Recursive feature elimination was used to identify a subset of the highest performing linguistic characteristics, which turned out to align with the literature. Using pre-trained word embeddings, document embeddings and weighted document embeddings were constructed using each word's TF-IDF value as the weight factor. The document embeddings were mixed with the linguistic features providing a variety of training/test feature sets. For each model, the best performing feature set was identified and fine-tuned regarding its hyper parameters to improve accuracy. ML algorithms' results were compared with two Neural Networks: Convolutional Neural Network (CNN) and Long-Short-Term Memory (LSTM). The results indicate that CNN outperformed all other methods in terms of accuracy, when companied with pre-trained word embeddings, yet SVM performs almost the same with a wider variety of input feature sets. Although style-based technique scores lower accuracy, it provides explainable results about the author's writing style decisions. Our work points out how new technologies and combinations of existing techniques can enhance the style-based approach capturing more information.</p>\",\"PeriodicalId\":50305,\"journal\":{\"name\":\"International Journal of Neural Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Neural Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129065722500587\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/11/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Neural Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/S0129065722500587","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/11/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

“假新闻”是指故意传播新闻,以欺骗和误导公众。本文使用基于风格的技术评估了几种机器学习(ML)算法的准确性,该技术依赖于从新闻中提取的文本信息,例如词性计数。为了扩展已经提出的基于样式的技术,提出了一种新的增强语言特征集的方法。它结合了命名实体识别(NER)和频繁模式(FP)增长关联规则挖掘算法,旨在更好地洞察论文的句子层次结构。递归特征消除被用来识别表现最好的语言特征子集,结果与文献一致。使用预训练的词嵌入,以每个词的TF-IDF值作为权重因子构建文档嵌入和加权文档嵌入。文档嵌入与语言特征混合在一起,提供了各种训练/测试特征集。对于每个模型,识别出表现最好的特征集,并对其超参数进行微调,以提高准确性。将ML算法的结果与卷积神经网络(CNN)和长短期记忆(LSTM)两种神经网络进行比较。结果表明,当与预训练的词嵌入相结合时,CNN在准确性方面优于所有其他方法,而SVM在更广泛的输入特征集上的表现几乎相同。尽管基于风格的技巧得分较低,但它提供了关于作者写作风格决定的可解释结果。我们的工作指出了新技术和现有技术的组合如何增强基于风格的方法来获取更多的信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploiting Textual Information for Fake News Detection.

"Fake news" refers to the deliberate dissemination of news with the purpose to deceive and mislead the public. This paper assesses the accuracy of several Machine Learning (ML) algorithms, using a style-based technique that relies on textual information extracted from news, such as part of speech counts. To expand the already proposed styled-based techniques, a new method of enhancing a linguistic feature set is proposed. It combines Named Entity Recognition (NER) with the Frequent Pattern (FP) Growth association rule mining algorithm, aiming to provide better insight into the papers' sentence level structure. Recursive feature elimination was used to identify a subset of the highest performing linguistic characteristics, which turned out to align with the literature. Using pre-trained word embeddings, document embeddings and weighted document embeddings were constructed using each word's TF-IDF value as the weight factor. The document embeddings were mixed with the linguistic features providing a variety of training/test feature sets. For each model, the best performing feature set was identified and fine-tuned regarding its hyper parameters to improve accuracy. ML algorithms' results were compared with two Neural Networks: Convolutional Neural Network (CNN) and Long-Short-Term Memory (LSTM). The results indicate that CNN outperformed all other methods in terms of accuracy, when companied with pre-trained word embeddings, yet SVM performs almost the same with a wider variety of input feature sets. Although style-based technique scores lower accuracy, it provides explainable results about the author's writing style decisions. Our work points out how new technologies and combinations of existing techniques can enhance the style-based approach capturing more information.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Neural Systems
International Journal of Neural Systems 工程技术-计算机:人工智能
CiteScore
11.30
自引率
28.80%
发文量
116
审稿时长
24 months
期刊介绍: The International Journal of Neural Systems is a monthly, rigorously peer-reviewed transdisciplinary journal focusing on information processing in both natural and artificial neural systems. Special interests include machine learning, computational neuroscience and neurology. The journal prioritizes innovative, high-impact articles spanning multiple fields, including neurosciences and computer science and engineering. It adopts an open-minded approach to this multidisciplinary field, serving as a platform for novel ideas and enhanced understanding of collective and cooperative phenomena in computationally capable systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信