Boosting Stemmer Performance Using Cache Method

Jurnal Matematika Dan Ilmu Pengetahuan Alam LLDikti Wilayah 1 (JUMPA) Pub Date : 2021-03-30 DOI:10.54076/jumpa.v1i1.34

Muhammad Fadly Tanjung

引用次数: 0

Abstract

Stemming is the process of returning the word to the base word by disappearing the append. This is important to support better information re-meeting. Some research in stemming algorithms includes nazief &adriani algorithms, confix stripping, enhanced confix stripping, arifin and porter algorithms. The stemming algorithm for Bahasa Indonesia is divided into two, namely those that use a dictionary and do not use a dictionary. Some studies have shown that stemmers that use dictionary have high accuracy but low process speed, while stemmers that do not use dictionary have low accuracy but higher process speed. In this study, two methods were used: the stemmer method using cache and stemmer without cache to see the comparison of process speed from stemmers that use dictionary. The test data for this study is text data obtained from the corpus site. Process analysis is completed by calculating each speed, memory usage and cpu of each method, then each method is compared. Results from tests from test data showed that the cache method improved stemmer performance.

查看原文本刊更多论文

使用缓存方法提高系统性能

词干提取是通过删除词尾将单词返回到词根的过程。这对于支持更好的信息再会议非常重要。对词干提取算法的研究包括nazief &adriani算法、confix剥离、增强confix剥离、arifin和porter算法。印尼语的词干提取算法分为两种，即使用字典和不使用字典的算法。有研究表明，使用词典的词干准确率较高，但处理速度较慢，而不使用词典的词干准确率较低，但处理速度较高。在本研究中，我们使用了两种方法:使用缓存的干体方法和不使用缓存的干体方法，对比使用字典的干体处理速度。本研究的测试数据是从语料库站点获得的文本数据。通过计算每种方法的运行速度、内存占用率和cpu占用率来完成进程分析，并对每种方法进行比较。测试数据的测试结果表明，缓存方法提高了系统性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jurnal Matematika Dan Ilmu Pengetahuan Alam LLDikti Wilayah 1 (JUMPA)

自引率

0.00%

发文量