Boosting Stemmer Performance Using Cache Method

Muhammad Fadly Tanjung
{"title":"Boosting Stemmer Performance Using Cache Method","authors":"Muhammad Fadly Tanjung","doi":"10.54076/jumpa.v1i1.34","DOIUrl":null,"url":null,"abstract":"Stemming is the process of returning the word to the base word by disappearing the append. This is important to support better information re-meeting. Some research in stemming algorithms includes nazief &adriani algorithms, confix stripping, enhanced confix stripping, arifin and porter algorithms. The stemming algorithm for Bahasa Indonesia is divided into two, namely those that use a dictionary and do not use a dictionary. Some studies have shown that stemmers that use dictionary have high accuracy but low process speed, while stemmers that do not use dictionary have low accuracy but higher process speed. In this study, two methods were used: the stemmer method using cache and stemmer without cache to see the comparison of process speed from stemmers that use dictionary. The test data for this study is text data obtained from the corpus site. Process analysis is completed by calculating each speed, memory usage and cpu of each method, then each method is compared. Results from tests from test data showed that the cache method improved stemmer performance.","PeriodicalId":17729,"journal":{"name":"Jurnal Matematika Dan Ilmu Pengetahuan Alam LLDikti Wilayah 1 (JUMPA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Matematika Dan Ilmu Pengetahuan Alam LLDikti Wilayah 1 (JUMPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54076/jumpa.v1i1.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Stemming is the process of returning the word to the base word by disappearing the append. This is important to support better information re-meeting. Some research in stemming algorithms includes nazief &adriani algorithms, confix stripping, enhanced confix stripping, arifin and porter algorithms. The stemming algorithm for Bahasa Indonesia is divided into two, namely those that use a dictionary and do not use a dictionary. Some studies have shown that stemmers that use dictionary have high accuracy but low process speed, while stemmers that do not use dictionary have low accuracy but higher process speed. In this study, two methods were used: the stemmer method using cache and stemmer without cache to see the comparison of process speed from stemmers that use dictionary. The test data for this study is text data obtained from the corpus site. Process analysis is completed by calculating each speed, memory usage and cpu of each method, then each method is compared. Results from tests from test data showed that the cache method improved stemmer performance.
使用缓存方法提高系统性能
词干提取是通过删除词尾将单词返回到词根的过程。这对于支持更好的信息再会议非常重要。对词干提取算法的研究包括nazief &adriani算法、confix剥离、增强confix剥离、arifin和porter算法。印尼语的词干提取算法分为两种,即使用字典和不使用字典的算法。有研究表明,使用词典的词干准确率较高,但处理速度较慢,而不使用词典的词干准确率较低,但处理速度较高。在本研究中,我们使用了两种方法:使用缓存的干体方法和不使用缓存的干体方法,对比使用字典的干体处理速度。本研究的测试数据是从语料库站点获得的文本数据。通过计算每种方法的运行速度、内存占用率和cpu占用率来完成进程分析,并对每种方法进行比较。测试数据的测试结果表明,缓存方法提高了系统性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信