使用粒子群优化算法优化印度尼西亚文本数据的聚类:古兰经》翻译案例研究

M. D. R. Wahyudi, Agung Fatwanto
{"title":"使用粒子群优化算法优化印度尼西亚文本数据的聚类:古兰经》翻译案例研究","authors":"M. D. R. Wahyudi, Agung Fatwanto","doi":"10.35671/telematika.v17i1.2724","DOIUrl":null,"url":null,"abstract":"The Quran considered the holy book for Muslims, contains scientific and historical facts affirming Islam's truth, beauty, and influence on human life. Consequently, the Quran text and its translations are valuable sources for text mining research, particularly for studying the interrelationship of its verses. One approach to grouping objects using certain algorithms is clustering, with K-Means Clustering being a prominent example. However, clustering results are often suboptimal due to the random selection of centroids. To address this, the study proposes using the Particle Swarm Optimization (PSO) algorithm, which selects centroids based on PSO results. The hybrid PSO algorithm initiates a single iteration of the K-means algorithm. It concludes either upon reaching the maximum iteration limit or when the average shift in the center of the mass vector falls below 0.0001. Evaluation of the clustering results from the three models indicates that the K-Means algorithm produced the lowest Sum of Squared Error (SSE) value of 1032.19. Additionally, the hybrid PSO algorithm generated the highest Silhouette value of 0.258 and the lowest quantization value of 0.00947. Further evaluation using a confusion matrix showed that K-Means clustering had an accuracy rate of 81.7%, K-Means with PSO had 82.5%, and the combination of K-Means with hybrid PSO yielded the highest accuracy rate of 91.1% among the three grouping model.","PeriodicalId":31716,"journal":{"name":"Telematika","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing Clustering of Indonesian Text Data Using Particle Swarm Optimization Algorithm: A Case Study of the Quran Translation\",\"authors\":\"M. D. R. Wahyudi, Agung Fatwanto\",\"doi\":\"10.35671/telematika.v17i1.2724\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Quran considered the holy book for Muslims, contains scientific and historical facts affirming Islam's truth, beauty, and influence on human life. Consequently, the Quran text and its translations are valuable sources for text mining research, particularly for studying the interrelationship of its verses. One approach to grouping objects using certain algorithms is clustering, with K-Means Clustering being a prominent example. However, clustering results are often suboptimal due to the random selection of centroids. To address this, the study proposes using the Particle Swarm Optimization (PSO) algorithm, which selects centroids based on PSO results. The hybrid PSO algorithm initiates a single iteration of the K-means algorithm. It concludes either upon reaching the maximum iteration limit or when the average shift in the center of the mass vector falls below 0.0001. Evaluation of the clustering results from the three models indicates that the K-Means algorithm produced the lowest Sum of Squared Error (SSE) value of 1032.19. Additionally, the hybrid PSO algorithm generated the highest Silhouette value of 0.258 and the lowest quantization value of 0.00947. Further evaluation using a confusion matrix showed that K-Means clustering had an accuracy rate of 81.7%, K-Means with PSO had 82.5%, and the combination of K-Means with hybrid PSO yielded the highest accuracy rate of 91.1% among the three grouping model.\",\"PeriodicalId\":31716,\"journal\":{\"name\":\"Telematika\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Telematika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35671/telematika.v17i1.2724\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35671/telematika.v17i1.2724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

古兰经》被视为穆斯林的圣书,其中包含的科学和历史事实肯定了伊斯兰教的真善美以及对人类生活的影响。因此,《古兰经》文本及其译本是文本挖掘研究的宝贵资料,尤其是在研究其经文的相互关系方面。使用某些算法对对象进行分组的一种方法是聚类,K-Means 聚类就是一个突出的例子。然而,由于中心点的随机选择,聚类结果往往不够理想。为解决这一问题,研究建议使用粒子群优化(PSO)算法,该算法根据 PSO 结果选择中心点。混合 PSO 算法对 K-means 算法进行一次迭代。当达到最大迭代限制或质量向量中心的平均移动量低于 0.0001 时,迭代结束。对三种模型聚类结果的评估表明,K-Means 算法产生的平方误差总和(SSE)值最低,为 1032.19。此外,混合 PSO 算法产生的剪影值最高,为 0.258,量化值最低,为 0.00947。使用混淆矩阵进行的进一步评估显示,K-Means 聚类的准确率为 81.7%,K-Means 与 PSO 算法的准确率为 82.5%,而 K-Means 与混合 PSO 算法的组合在三种分组模型中准确率最高,达到 91.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimizing Clustering of Indonesian Text Data Using Particle Swarm Optimization Algorithm: A Case Study of the Quran Translation
The Quran considered the holy book for Muslims, contains scientific and historical facts affirming Islam's truth, beauty, and influence on human life. Consequently, the Quran text and its translations are valuable sources for text mining research, particularly for studying the interrelationship of its verses. One approach to grouping objects using certain algorithms is clustering, with K-Means Clustering being a prominent example. However, clustering results are often suboptimal due to the random selection of centroids. To address this, the study proposes using the Particle Swarm Optimization (PSO) algorithm, which selects centroids based on PSO results. The hybrid PSO algorithm initiates a single iteration of the K-means algorithm. It concludes either upon reaching the maximum iteration limit or when the average shift in the center of the mass vector falls below 0.0001. Evaluation of the clustering results from the three models indicates that the K-Means algorithm produced the lowest Sum of Squared Error (SSE) value of 1032.19. Additionally, the hybrid PSO algorithm generated the highest Silhouette value of 0.258 and the lowest quantization value of 0.00947. Further evaluation using a confusion matrix showed that K-Means clustering had an accuracy rate of 81.7%, K-Means with PSO had 82.5%, and the combination of K-Means with hybrid PSO yielded the highest accuracy rate of 91.1% among the three grouping model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
7
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信