Extraction of Malay Root Word that Starts with Letter P in Malay e-Khutbah using Rule Based

International Journal of Software Engineering and Computer Systems Pub Date : 2023-01-31 DOI:10.15282/ijsecs.9.1.2023.4.0108

Nurhilyana Anuar, Zamri Abu Bakar, Normaly Kamal Ismail

{"title":"Extraction of Malay Root Word that Starts with Letter P in Malay e-Khutbah using Rule Based","authors":"Nurhilyana Anuar, Zamri Abu Bakar, Normaly Kamal Ismail","doi":"10.15282/ijsecs.9.1.2023.4.0108","DOIUrl":null,"url":null,"abstract":"Stemming is an important process in text processing especially in Natural Language Processing (NLP). It could extract root word from the affix words in the text. In addition, it helps in extracting useful information that contributes to many area of research study such as Information Retrieval. Several stemming algorithms have been discussed in previous studies. However, there are limited studies on Malay stemming process and the number of experimental data used. In this study, we focus on stemming process of Malay stemming algorithm by using rule-based algorithm for a larger dataset of Malay language text. The syntactic linguistic rule-based method was used in the stemming process involves of removing prefixes, suffixes and, prefixes and suffixes. Training dataset was used in this study which consisted of 3233 sentences from e-khutbah text. The result of the experimental evaluation was done by measuring the precision, recall and f-measure. It was found that the algorithm used in this study showed a promising result based on total of dataset used for each test. The value of precision, recall and F-measure increase to 95%, 97% and 97% respectively. The enhancement of the stemming process has shown a significant impact on Malay text processing which in general improved the performance of NLP applications.","PeriodicalId":31240,"journal":{"name":"International Journal of Software Engineering and Computer Systems","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15282/ijsecs.9.1.2023.4.0108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Stemming is an important process in text processing especially in Natural Language Processing (NLP). It could extract root word from the affix words in the text. In addition, it helps in extracting useful information that contributes to many area of research study such as Information Retrieval. Several stemming algorithms have been discussed in previous studies. However, there are limited studies on Malay stemming process and the number of experimental data used. In this study, we focus on stemming process of Malay stemming algorithm by using rule-based algorithm for a larger dataset of Malay language text. The syntactic linguistic rule-based method was used in the stemming process involves of removing prefixes, suffixes and, prefixes and suffixes. Training dataset was used in this study which consisted of 3233 sentences from e-khutbah text. The result of the experimental evaluation was done by measuring the precision, recall and f-measure. It was found that the algorithm used in this study showed a promising result based on total of dataset used for each test. The value of precision, recall and F-measure increase to 95%, 97% and 97% respectively. The enhancement of the stemming process has shown a significant impact on Malay text processing which in general improved the performance of NLP applications.

查看原文本刊更多论文

基于规则的马来语e-Khutbah中以字母P开头的马来词根词提取

词干提取是文本处理特别是自然语言处理(NLP)中的重要过程。它可以从文本中的词缀词中提取词根。此外，它有助于提取有用的信息，有助于许多领域的研究，如信息检索。在以前的研究中已经讨论了几种词干提取算法。然而，对马来语词干过程的研究和使用的实验数据数量有限。在本研究中，我们重点研究马来语词干提取算法的词干提取过程，并使用基于规则的算法对一个更大的马来语文本数据集进行分析。在词干提取过程中采用了基于句法语言学规则的方法，包括去除前缀、后缀和前缀后缀。本研究使用的训练数据集由来自e-khutbah文本的3233个句子组成。通过测量查全率、查全率和f-测度对实验结果进行了评价。研究发现，基于每个测试使用的数据集总数，本研究中使用的算法显示出令人满意的结果。精密度、召回率和f测量值分别提高到95%、97%和97%。词干提取过程的增强对马来语文本处理产生了重大影响，总体上提高了NLP应用程序的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Software Engineering and Computer Systems

自引率

0.00%

发文量