利用文本信息进行假新闻检测。

IF 5.5 2区材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY

ACS Applied Nano Materials Pub Date : 2022-12-01 Epub Date: 2022-11-04 DOI:10.1142/S0129065722500587

Dimitrios Panagiotis Kasseropoulos, Paraskevas Koukaras, Christos Tjortjis

{"title":"利用文本信息进行假新闻检测。","authors":"Dimitrios Panagiotis Kasseropoulos, Paraskevas Koukaras, Christos Tjortjis","doi":"10.1142/S0129065722500587","DOIUrl":null,"url":null,"abstract":"\"Fake news\" refers to the deliberate dissemination of news with the purpose to deceive and mislead the public. This paper assesses the accuracy of several Machine Learning (ML) algorithms, using a style-based technique that relies on textual information extracted from news, such as part of speech counts. To expand the already proposed styled-based techniques, a new method of enhancing a linguistic feature set is proposed. It combines Named Entity Recognition (NER) with the Frequent Pattern (FP) Growth association rule mining algorithm, aiming to provide better insight into the papers' sentence level structure. Recursive feature elimination was used to identify a subset of the highest performing linguistic characteristics, which turned out to align with the literature. Using pre-trained word embeddings, document embeddings and weighted document embeddings were constructed using each word's TF-IDF value as the weight factor. The document embeddings were mixed with the linguistic features providing a variety of training/test feature sets. For each model, the best performing feature set was identified and fine-tuned regarding its hyper parameters to improve accuracy. ML algorithms' results were compared with two Neural Networks: Convolutional Neural Network (CNN) and Long-Short-Term Memory (LSTM). The results indicate that CNN outperformed all other methods in terms of accuracy, when companied with pre-trained word embeddings, yet SVM performs almost the same with a wider variety of input feature sets. Although style-based technique scores lower accuracy, it provides explainable results about the author's writing style decisions. Our work points out how new technologies and combinations of existing techniques can enhance the style-based approach capturing more information.","PeriodicalId":6,"journal":{"name":"ACS Applied Nano Materials","volume":" ","pages":"2250058"},"PeriodicalIF":5.5000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting Textual Information for Fake News Detection.\",\"authors\":\"Dimitrios Panagiotis Kasseropoulos, Paraskevas Koukaras, Christos Tjortjis\",\"doi\":\"10.1142/S0129065722500587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\\"Fake news\\\" refers to the deliberate dissemination of news with the purpose to deceive and mislead the public. This paper assesses the accuracy of several Machine Learning (ML) algorithms, using a style-based technique that relies on textual information extracted from news, such as part of speech counts. To expand the already proposed styled-based techniques, a new method of enhancing a linguistic feature set is proposed. It combines Named Entity Recognition (NER) with the Frequent Pattern (FP) Growth association rule mining algorithm, aiming to provide better insight into the papers' sentence level structure. Recursive feature elimination was used to identify a subset of the highest performing linguistic characteristics, which turned out to align with the literature. Using pre-trained word embeddings, document embeddings and weighted document embeddings were constructed using each word's TF-IDF value as the weight factor. The document embeddings were mixed with the linguistic features providing a variety of training/test feature sets. For each model, the best performing feature set was identified and fine-tuned regarding its hyper parameters to improve accuracy. ML algorithms' results were compared with two Neural Networks: Convolutional Neural Network (CNN) and Long-Short-Term Memory (LSTM). The results indicate that CNN outperformed all other methods in terms of accuracy, when companied with pre-trained word embeddings, yet SVM performs almost the same with a wider variety of input feature sets. Although style-based technique scores lower accuracy, it provides explainable results about the author's writing style decisions. Our work points out how new technologies and combinations of existing techniques can enhance the style-based approach capturing more information.\",\"PeriodicalId\":6,\"journal\":{\"name\":\"ACS Applied Nano Materials\",\"volume\":\" \",\"pages\":\"2250058\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Nano Materials\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129065722500587\",\"RegionNum\":2,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/11/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Nano Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1142/S0129065722500587","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/11/4 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

“假新闻”是指故意传播新闻，以欺骗和误导公众。本文使用基于风格的技术评估了几种机器学习（ML）算法的准确性，该技术依赖于从新闻中提取的文本信息，例如词性计数。为了扩展已经提出的基于样式的技术，提出了一种新的增强语言特征集的方法。它结合了命名实体识别（NER）和频繁模式（FP）增长关联规则挖掘算法，旨在更好地洞察论文的句子层次结构。递归特征消除被用来识别表现最好的语言特征子集，结果与文献一致。使用预训练的词嵌入，以每个词的TF-IDF值作为权重因子构建文档嵌入和加权文档嵌入。文档嵌入与语言特征混合在一起，提供了各种训练/测试特征集。对于每个模型，识别出表现最好的特征集，并对其超参数进行微调，以提高准确性。将ML算法的结果与卷积神经网络（CNN）和长短期记忆（LSTM）两种神经网络进行比较。结果表明，当与预训练的词嵌入相结合时，CNN在准确性方面优于所有其他方法，而SVM在更广泛的输入特征集上的表现几乎相同。尽管基于风格的技巧得分较低，但它提供了关于作者写作风格决定的可解释结果。我们的工作指出了新技术和现有技术的组合如何增强基于风格的方法来获取更多的信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting Textual Information for Fake News Detection.

"Fake news" refers to the deliberate dissemination of news with the purpose to deceive and mislead the public. This paper assesses the accuracy of several Machine Learning (ML) algorithms, using a style-based technique that relies on textual information extracted from news, such as part of speech counts. To expand the already proposed styled-based techniques, a new method of enhancing a linguistic feature set is proposed. It combines Named Entity Recognition (NER) with the Frequent Pattern (FP) Growth association rule mining algorithm, aiming to provide better insight into the papers' sentence level structure. Recursive feature elimination was used to identify a subset of the highest performing linguistic characteristics, which turned out to align with the literature. Using pre-trained word embeddings, document embeddings and weighted document embeddings were constructed using each word's TF-IDF value as the weight factor. The document embeddings were mixed with the linguistic features providing a variety of training/test feature sets. For each model, the best performing feature set was identified and fine-tuned regarding its hyper parameters to improve accuracy. ML algorithms' results were compared with two Neural Networks: Convolutional Neural Network (CNN) and Long-Short-Term Memory (LSTM). The results indicate that CNN outperformed all other methods in terms of accuracy, when companied with pre-trained word embeddings, yet SVM performs almost the same with a wider variety of input feature sets. Although style-based technique scores lower accuracy, it provides explainable results about the author's writing style decisions. Our work points out how new technologies and combinations of existing techniques can enhance the style-based approach capturing more information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACS Applied Nano Materials Multiple-

CiteScore

8.30

自引率

3.40%

发文量

1601

期刊介绍： ACS Applied Nano Materials is an interdisciplinary journal publishing original research covering all aspects of engineering, chemistry, physics and biology relevant to applications of nanomaterials. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important applications of nanomaterials.