COVID-19信息流行病-使用机器学习方法了解检测假新闻的内容特征

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Malaysian Journal of Computer Science Pub Date : 2023-01-31 DOI:10.22452/mjcs.vol36no1.1

Vimala Balakrishnan (Corresponding Author), Hii Lee Zing, Eric Guy Claude Laporte

{"title":"COVID-19信息流行病-使用机器学习方法了解检测假新闻的内容特征","authors":"Vimala Balakrishnan (Corresponding Author), Hii Lee Zing, Eric Guy Claude Laporte","doi":"10.22452/mjcs.vol36no1.1","DOIUrl":null,"url":null,"abstract":"The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selection of content features such as word bigrams, part of speech distribution etc. to improve fake news detection. We performed a series of experiments on a new dataset gathered during the COVID-19 pandemic using Decision Tree, K-Nearest Neighbor, Logistic Regression, Support Vector Machine and Random Forest. Random Forest yielded the best results, followed closely by Support Vector Machine, across all setups. In general, both the textual and linguistic features were found to improve fake news detection when used separately, however, combining them into a single model did not improve the detection significantly. Differences were also noted between the use of bigrams and part of speech tags. The study shows that textual and linguistic features can be used successfully in detecting fake news using the traditional machine learning approach as opposed to deep learning.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"COVID-19 INFODEMIC – UNDERSTANDING CONTENT FEATURES IN DETECTING FAKE NEWS USING A MACHINE LEARNING APPROACH\",\"authors\":\"Vimala Balakrishnan (Corresponding Author), Hii Lee Zing, Eric Guy Claude Laporte\",\"doi\":\"10.22452/mjcs.vol36no1.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selection of content features such as word bigrams, part of speech distribution etc. to improve fake news detection. We performed a series of experiments on a new dataset gathered during the COVID-19 pandemic using Decision Tree, K-Nearest Neighbor, Logistic Regression, Support Vector Machine and Random Forest. Random Forest yielded the best results, followed closely by Support Vector Machine, across all setups. In general, both the textual and linguistic features were found to improve fake news detection when used separately, however, combining them into a single model did not improve the detection significantly. Differences were also noted between the use of bigrams and part of speech tags. The study shows that textual and linguistic features can be used successfully in detecting fake news using the traditional machine learning approach as opposed to deep learning.\",\"PeriodicalId\":49894,\"journal\":{\"name\":\"Malaysian Journal of Computer Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Malaysian Journal of Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.22452/mjcs.vol36no1.1\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.vol36no1.1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

摘要

使用内容特征，特别是文本和语言特征来检测假新闻的研究不足，尽管经验证据表明这些特征有助于区分真新闻和假新闻。为此，本研究调查了一些内容特征，如单词bigram、词性分布等，以提高假新闻的检测能力。我们使用决策树、K最近邻、逻辑回归、支持向量机和随机森林，在新冠肺炎大流行期间收集的新数据集上进行了一系列实验。随机森林在所有设置中产生了最好的结果，紧随其后的是支持向量机。一般来说，当单独使用时，文本和语言特征都被发现可以提高假新闻的检测能力，然而，将它们组合到一个模型中并不能显著提高检测能力。还注意到二元词和词性标签的使用之间的差异。研究表明，与深度学习相比，使用传统的机器学习方法可以成功地将文本和语言特征用于检测假新闻。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

COVID-19 INFODEMIC – UNDERSTANDING CONTENT FEATURES IN DETECTING FAKE NEWS USING A MACHINE LEARNING APPROACH

The use of content features, particularly textual and linguistic for fake news detection is under-researched, despite empirical evidence showing the features could contribute to differentiating real and fake news. To this end, this study investigates a selection of content features such as word bigrams, part of speech distribution etc. to improve fake news detection. We performed a series of experiments on a new dataset gathered during the COVID-19 pandemic using Decision Tree, K-Nearest Neighbor, Logistic Regression, Support Vector Machine and Random Forest. Random Forest yielded the best results, followed closely by Support Vector Machine, across all setups. In general, both the textual and linguistic features were found to improve fake news detection when used separately, however, combining them into a single model did not improve the detection significantly. Differences were also noted between the use of bigrams and part of speech tags. The study shows that textual and linguistic features can be used successfully in detecting fake news using the traditional machine learning approach as opposed to deep learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Malaysian Journal of Computer Science COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

2.20

自引率

33.30%

发文量

审稿时长

7.5 months

期刊介绍： The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication. The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus