Term Standardisation With LDA Model To Detect Service Disruption Events Using English And Manglish Tweets

Noraysha Yusuf, Maizatul Akmar Ismail, Tasnim M. A. Zayet, Kasturi Dewi Varathan, Rafidah MD Noor
{"title":"Term Standardisation With LDA Model To Detect Service Disruption Events Using English And Manglish Tweets","authors":"Noraysha Yusuf, Maizatul Akmar Ismail, Tasnim M. A. Zayet, Kasturi Dewi Varathan, Rafidah MD Noor","doi":"10.33093/jiwe.2024.3.1.1","DOIUrl":null,"url":null,"abstract":"Rapid transit is one of Malaysia's most important transportation modes, where commuters use public transportation to travel. Any disruption in the rapid transit service affects their daily routines. Therefore, detecting such service disruption has become fundamental. In this study, the disruption in Malaysia's rapid transit service was assessed using English and Manglish (a combination of English and Malay) tweets through Latent Dirichlet Allocation (LDA). The gathered tweets were classified into event and non-event tweets and LDA was applied to the event tweets. Manglish event tweets were pre-processed using the proposed term standardisation technique. As a result, LDA has proved its efficiency in topic detection for both English and Manglish tweets with better performance for Manglish tweets; The best event detection rate of the LDA_English model was at the likelihood of 80% while the best detection rate of the LDA_Manglish model was at a likelihood of 60%.","PeriodicalId":484462,"journal":{"name":"Journal of Informatics and Web Engineering","volume":"51 13","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informatics and Web Engineering","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.33093/jiwe.2024.3.1.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Rapid transit is one of Malaysia's most important transportation modes, where commuters use public transportation to travel. Any disruption in the rapid transit service affects their daily routines. Therefore, detecting such service disruption has become fundamental. In this study, the disruption in Malaysia's rapid transit service was assessed using English and Manglish (a combination of English and Malay) tweets through Latent Dirichlet Allocation (LDA). The gathered tweets were classified into event and non-event tweets and LDA was applied to the event tweets. Manglish event tweets were pre-processed using the proposed term standardisation technique. As a result, LDA has proved its efficiency in topic detection for both English and Manglish tweets with better performance for Manglish tweets; The best event detection rate of the LDA_English model was at the likelihood of 80% while the best detection rate of the LDA_Manglish model was at a likelihood of 60%.
利用 LDA 模型实现术语标准化,使用英语和孟加拉语推文检测服务中断事件
快速公交是马来西亚最重要的交通方式之一,通勤者使用公共交通出行。快速公交服务的任何中断都会影响他们的日常生活。因此,检测这种服务中断已成为一项基本工作。在本研究中,通过 Latent Dirichlet Allocation (LDA) 方法,使用英语和 Manglish(英语和马来语的组合)推文对马来西亚快速公交服务的中断情况进行了评估。收集到的推文分为事件推文和非事件推文,LDA 适用于事件推文。使用所提出的术语标准化技术对曼格莱事件推文进行了预处理。结果证明,LDA 在英语和孟加拉语推文的主题检测方面都很有效,而孟加拉语推文的性能更好;LDA_英语模型的最佳事件检测率为 80%,而 LDA_ 孟加拉语模型的最佳检测率为 60%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信