A Novel Named Entity Recognition approach of Indonesian fake news using part of speech and BERT model on presidential election

Puji Winar Cahyo , Ulfi Saidata Aesyi , Widodo Agus Setianto , Tatang Sulaiman
{"title":"A Novel Named Entity Recognition approach of Indonesian fake news using part of speech and BERT model on presidential election","authors":"Puji Winar Cahyo ,&nbsp;Ulfi Saidata Aesyi ,&nbsp;Widodo Agus Setianto ,&nbsp;Tatang Sulaiman","doi":"10.1016/j.jjimei.2025.100354","DOIUrl":null,"url":null,"abstract":"<div><div>Fake news often spreads rapidly and can mislead readers, which makes it important to approach such information with caution. In text-based information, content extraction can be used to determine the meaning and intent of the message. Therefore, this research aims to develop a novel approach for entity detection in Indonesian-language fake news texts by applying BiLSTM-CRF, BiGRU, and BERT models. The novelty of this study lies in the integration of Part-of-Speech (PoS) tagging before processing words for entity detection. Words tagged as Noun (NN) and Proper Noun (NNP) are transformed into entity labels such as ORG for organizations, PER for people, and LOC for locations. Meanwhile, words labeled as Verb (VB) are converted into the ACT entity to represent actions. Evaluations were conducted by integrating PoS tagging with entity detection using the BiLSTM-CRF model, which achieved an F1-Score of 81.26%. The BiGRU-based model achieved an F1-Score of 79.46%, while the BERT-based model achieved the highest F1-Score of 87.38%. These results demonstrate that the BERT model, when combined with PoS tagging, provides the best performance and can effectively be used to detect entities in fake news. The entity detection process was further applied to identify fake news during the 2024 Indonesian presidential and vice-presidential election period. By counting the number of mentions of each candidate and their running mate labeled as PER entities, it has result the Prabowo Subianto–Gibran Rakabuming Raka pair appeared in 49 fake news articles. This was followed by the Ganjar Pranowo–Mahfud MD pair with 14 fake news articles, and the Anies Baswedan–Muhaimin Iskandar pair with 13 articles. All identified data have been filtered to retain only unique entries.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100354"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Management Data Insights","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667096825000369","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Fake news often spreads rapidly and can mislead readers, which makes it important to approach such information with caution. In text-based information, content extraction can be used to determine the meaning and intent of the message. Therefore, this research aims to develop a novel approach for entity detection in Indonesian-language fake news texts by applying BiLSTM-CRF, BiGRU, and BERT models. The novelty of this study lies in the integration of Part-of-Speech (PoS) tagging before processing words for entity detection. Words tagged as Noun (NN) and Proper Noun (NNP) are transformed into entity labels such as ORG for organizations, PER for people, and LOC for locations. Meanwhile, words labeled as Verb (VB) are converted into the ACT entity to represent actions. Evaluations were conducted by integrating PoS tagging with entity detection using the BiLSTM-CRF model, which achieved an F1-Score of 81.26%. The BiGRU-based model achieved an F1-Score of 79.46%, while the BERT-based model achieved the highest F1-Score of 87.38%. These results demonstrate that the BERT model, when combined with PoS tagging, provides the best performance and can effectively be used to detect entities in fake news. The entity detection process was further applied to identify fake news during the 2024 Indonesian presidential and vice-presidential election period. By counting the number of mentions of each candidate and their running mate labeled as PER entities, it has result the Prabowo Subianto–Gibran Rakabuming Raka pair appeared in 49 fake news articles. This was followed by the Ganjar Pranowo–Mahfud MD pair with 14 fake news articles, and the Anies Baswedan–Muhaimin Iskandar pair with 13 articles. All identified data have been filtered to retain only unique entries.
基于词性和BERT模型的印尼总统选举假新闻命名实体识别方法
假新闻往往传播迅速,可能会误导读者,因此谨慎对待这类信息非常重要。在基于文本的信息中,可以使用内容提取来确定消息的含义和意图。因此,本研究旨在通过应用BiLSTM-CRF、BiGRU和BERT模型,开发一种新的印尼语假新闻文本实体检测方法。本研究的新颖之处在于将词性标注整合到处理词之前进行实体检测。标记为名词(NN)和专有名词(NNP)的单词被转换为实体标签,例如ORG代表组织,PER代表人员,LOC代表地点。同时,将标记为动词(VB)的单词转换为表示动作的ACT实体。采用BiLSTM-CRF模型将PoS标注与实体检测相结合进行评价,F1-Score为81.26%。基于bigru的模型F1-Score为79.46%,而基于bert的模型F1-Score最高,为87.38%。这些结果表明,BERT模型与词性标注相结合,可以提供最好的性能,并且可以有效地用于假新闻中的实体检测。实体检测过程进一步应用于2024年印尼总统和副总统选举期间的假新闻识别。通过计算每个候选人及其竞选伙伴被标记为PER实体的提及次数,结果Prabowo Subianto-Gibran Rakabuming Raka夫妇出现在49篇假新闻文章中。紧随其后的是Ganjar Pranowo-Mahfud MD组合,发表了14篇假新闻,Anies Baswedan-Muhaimin Iskandar组合发表了13篇假新闻。所有标识的数据都经过过滤,只保留唯一的条目。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信