Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)

Rizka Maulida Yanti, Ibnu Santoso, Lya Hulliyyatus Suadaa
{"title":"Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)","authors":"Rizka Maulida Yanti, Ibnu Santoso, Lya Hulliyyatus Suadaa","doi":"10.24002/ijis.v4i1.4677","DOIUrl":null,"url":null,"abstract":"SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.","PeriodicalId":34118,"journal":{"name":"Indonesian Journal of Information Systems","volume":"425 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indonesian Journal of Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24002/ijis.v4i1.4677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.
通过Twitter的命名实体识别在印度尼西亚空间中的应用(案例研究:日惹特别地区的停电)
SpaCy是一种能够有效处理自然语言处理(NLP)问题的工具,其中之一就是命名实体识别(NER)。NER用于提取和识别文本中的命名实体。然而,到目前为止,space还没有正式发布NER模型的印尼语预训练。另一方面,根据2019年PLN的统计报告,日惹省是一个经常发生停电的省份,在推特上发现了许多与该省发生的停电有关的公众投诉。这是因为目前还没有关于提取电干扰相关信息的研究,使用印度尼西亚语的SpaCy对NER的研究仍然很少。因此,在本研究中,将对直航省的停电相关信息进行提取。日惹通过twitter使用印度尼西亚空间。本研究取得了良好的性能结果,计算精度为95.52%,召回率为93.27%,f1得分为94.38%。然后,根据与电干扰相关的推文中包含的位置实体进行映射。从这个过程中发现,推文中提到的与停电有关的地点最多的是Sleman Regency,而最少的是Gunung Kidul Regency。然后,停电最多的月份是2020年3月,而停电最少的月份是2020年7月。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
7
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信