印度尼西亚 COVID-19 假新闻检测模型

Q4 Earth and Planetary Sciences

ASEAN Engineering Journal Pub Date : 2023-10-24 DOI:10.11113/aej.v13.19648

Achmad Pratama Rifai, Y. P. Mulyani, Rian Febrianto, H. Arini, Titis Wijayanto, Nurul Lathifah, Xiao Liu, Jianxin Li, Hui Yin, Yutao Wu, Rami Mohawesh

{"title":"印度尼西亚 COVID-19 假新闻检测模型","authors":"Achmad Pratama Rifai, Y. P. Mulyani, Rian Febrianto, H. Arini, Titis Wijayanto, Nurul Lathifah, Xiao Liu, Jianxin Li, Hui Yin, Yutao Wu, Rami Mohawesh","doi":"10.11113/aej.v13.19648","DOIUrl":null,"url":null,"abstract":"Today, fake information has become a significant problem, exacerbated by the acceleration of access to information. The spread of fake information has a dangerous impact, especially regarding global health issues, for example COVID-19. People can access various resources to obtain information, including online sites and social media. One of the methods to control the spread of false information is detecting hoaxes. Many methods have been developed to identify hoaxes; most previous studies have focused on developing hoax detection methods using data from a single source in English. The present study is carried out to detect fake news in Indonesian language using multiple data sources, including traditional and social media in the context of COVID-19. The study uses Long Short-Term Memory (LSTM) and the Robustly Optimised Bidirectional Encoder Representations from Transformers Pre-Training Approach (RoBERTa). The LSTM approach is used to develop four different architectures that varied based on: (1) the use of text-only versus the use of both title and text; (2) the number of LSTM and dense layers; and (3) the activation function. The LSTM model with text-only data, a single LSTM layer and two dense layers, outperformed other LSTM architectures, achieving the highest accuracy of 92.17%. The LSTM models require a considerably short training time of 23–27 minutes for 3,847 articles and has a detection time of 3.8–4.1 ms per article. The RoBERTa classifiers outperformed all LSTM models with an accuracy of over 97% and a significantly better training time, with a margin of more than 50% compared to LSTM classifiers, although it had a slightly longer test time. Both LSTM and RoBERTa models outperformed the Naïve Bayes and SVM benchmark methods in terms of accuracy, precision, and recall. Therefore, this study shows that both LSTM and RoBERTa methods are reliable and can be reasonably implemented for real-time fake news detection.","PeriodicalId":36749,"journal":{"name":"ASEAN Engineering Journal","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DETECTION MODEL FOR FAKE NEWS ON COVID-19 IN INDONESIA\",\"authors\":\"Achmad Pratama Rifai, Y. P. Mulyani, Rian Febrianto, H. Arini, Titis Wijayanto, Nurul Lathifah, Xiao Liu, Jianxin Li, Hui Yin, Yutao Wu, Rami Mohawesh\",\"doi\":\"10.11113/aej.v13.19648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today, fake information has become a significant problem, exacerbated by the acceleration of access to information. The spread of fake information has a dangerous impact, especially regarding global health issues, for example COVID-19. People can access various resources to obtain information, including online sites and social media. One of the methods to control the spread of false information is detecting hoaxes. Many methods have been developed to identify hoaxes; most previous studies have focused on developing hoax detection methods using data from a single source in English. The present study is carried out to detect fake news in Indonesian language using multiple data sources, including traditional and social media in the context of COVID-19. The study uses Long Short-Term Memory (LSTM) and the Robustly Optimised Bidirectional Encoder Representations from Transformers Pre-Training Approach (RoBERTa). The LSTM approach is used to develop four different architectures that varied based on: (1) the use of text-only versus the use of both title and text; (2) the number of LSTM and dense layers; and (3) the activation function. The LSTM model with text-only data, a single LSTM layer and two dense layers, outperformed other LSTM architectures, achieving the highest accuracy of 92.17%. The LSTM models require a considerably short training time of 23–27 minutes for 3,847 articles and has a detection time of 3.8–4.1 ms per article. The RoBERTa classifiers outperformed all LSTM models with an accuracy of over 97% and a significantly better training time, with a margin of more than 50% compared to LSTM classifiers, although it had a slightly longer test time. Both LSTM and RoBERTa models outperformed the Naïve Bayes and SVM benchmark methods in terms of accuracy, precision, and recall. Therefore, this study shows that both LSTM and RoBERTa methods are reliable and can be reasonably implemented for real-time fake news detection.\",\"PeriodicalId\":36749,\"journal\":{\"name\":\"ASEAN Engineering Journal\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASEAN Engineering Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11113/aej.v13.19648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Earth and Planetary Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASEAN Engineering Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/aej.v13.19648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}

引用次数: 0

摘要

如今，虚假信息已成为一个严重问题，而信息获取速度的加快又加剧了这一问题。虚假信息的传播具有危险的影响，尤其是在全球健康问题上，例如 COVID-19。人们可以利用各种资源获取信息，包括在线网站和社交媒体。控制虚假信息传播的方法之一是检测骗局。目前已开发出许多方法来识别恶作剧；以往的大多数研究都侧重于使用来自单一英语来源的数据来开发恶作剧检测方法。本研究以 COVID-19 为背景，使用多种数据源（包括传统媒体和社交媒体）检测印尼语中的虚假新闻。研究采用了长短时记忆（LSTM）和来自变换器预训练方法的稳健优化双向编码器表征（RoBERTa）。LSTM 方法用于开发四种不同的架构，这些架构因以下因素而异：(1) 使用纯文本还是同时使用标题和文本；(2) LSTM 层和密集层的数量；以及 (3) 激活函数。使用纯文本数据的 LSTM 模型、一个 LSTM 层和两个密集层的表现优于其他 LSTM 架构，达到了 92.17% 的最高准确率。对于 3,847 篇文章，LSTM 模型需要的训练时间相当短，为 23-27 分钟，每篇文章的检测时间为 3.8-4.1 毫秒。RoBERTa 分类器的表现优于所有 LSTM 模型，准确率超过 97%，训练时间明显更短，与 LSTM 分类器相比，优势超过 50%，但测试时间略长。LSTM 和 RoBERTa 模型在准确度、精确度和召回率方面都优于 Naïve Bayes 和 SVM 基准方法。因此，本研究表明，LSTM 和 RoBERTa 方法都是可靠的，可以合理地用于实时假新闻检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DETECTION MODEL FOR FAKE NEWS ON COVID-19 IN INDONESIA

Today, fake information has become a significant problem, exacerbated by the acceleration of access to information. The spread of fake information has a dangerous impact, especially regarding global health issues, for example COVID-19. People can access various resources to obtain information, including online sites and social media. One of the methods to control the spread of false information is detecting hoaxes. Many methods have been developed to identify hoaxes; most previous studies have focused on developing hoax detection methods using data from a single source in English. The present study is carried out to detect fake news in Indonesian language using multiple data sources, including traditional and social media in the context of COVID-19. The study uses Long Short-Term Memory (LSTM) and the Robustly Optimised Bidirectional Encoder Representations from Transformers Pre-Training Approach (RoBERTa). The LSTM approach is used to develop four different architectures that varied based on: (1) the use of text-only versus the use of both title and text; (2) the number of LSTM and dense layers; and (3) the activation function. The LSTM model with text-only data, a single LSTM layer and two dense layers, outperformed other LSTM architectures, achieving the highest accuracy of 92.17%. The LSTM models require a considerably short training time of 23–27 minutes for 3,847 articles and has a detection time of 3.8–4.1 ms per article. The RoBERTa classifiers outperformed all LSTM models with an accuracy of over 97% and a significantly better training time, with a margin of more than 50% compared to LSTM classifiers, although it had a slightly longer test time. Both LSTM and RoBERTa models outperformed the Naïve Bayes and SVM benchmark methods in terms of accuracy, precision, and recall. Therefore, this study shows that both LSTM and RoBERTa methods are reliable and can be reasonably implemented for real-time fake news detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ASEAN Engineering Journal Engineering-Engineering (all)

CiteScore

0.60

自引率

0.00%

发文量