Databases on the Indonesian Prefixes PE- and PEN-

Karlina Denistia
{"title":"Databases on the Indonesian Prefixes PE- and PEN-","authors":"Karlina Denistia","doi":"10.24071/joll.v23i1.4967","DOIUrl":null,"url":null,"abstract":"This paper provides the theoretical grounding in constituting databases related to PE- and PEN-, two Indonesian nominalizing prefixes, which have various meanings (e.g., patient, agent, or instrument). The first database contains the words with PE- and PEN- whereas the second database provides the cosine similarity between two words of interest. Using a written Indonesian corpus as the primary source (Leipzig Corpora Collection), the databases contain the following information: PE- or PEN- prefixes, allomorph of PEN-, base word, semantics role, morphological variation, cosine similarity, as well as the word frequency. Furthermore, this paper elaborates the theoretical consideration on how each information was cultivated. In building the databases, Indonesian morphological parser and Word to Vector were used to analyze the Indonesian morphological status and to put the words in the corpus into a vector. In addition, manual verification for the data against the Indonesian comprehensive dictionary was also conducted. In the end, the databases are available for free so that the data could be used as materials for a corpus-based analysis on Indonesian morphology. This research shed light to a careful and thorough classification of the open-access databases of PE- and PEN- from their allomorphs, base word, semantics role, and morphological variation. The information provided in this article is hoped to be contributive in Indonesian morphology specifically, and other linguistics fields (e.g., corpus linguistics and quantitative linguistics) in general.  ","PeriodicalId":34541,"journal":{"name":"Journal of Language and Literature","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Language and Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24071/joll.v23i1.4967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper provides the theoretical grounding in constituting databases related to PE- and PEN-, two Indonesian nominalizing prefixes, which have various meanings (e.g., patient, agent, or instrument). The first database contains the words with PE- and PEN- whereas the second database provides the cosine similarity between two words of interest. Using a written Indonesian corpus as the primary source (Leipzig Corpora Collection), the databases contain the following information: PE- or PEN- prefixes, allomorph of PEN-, base word, semantics role, morphological variation, cosine similarity, as well as the word frequency. Furthermore, this paper elaborates the theoretical consideration on how each information was cultivated. In building the databases, Indonesian morphological parser and Word to Vector were used to analyze the Indonesian morphological status and to put the words in the corpus into a vector. In addition, manual verification for the data against the Indonesian comprehensive dictionary was also conducted. In the end, the databases are available for free so that the data could be used as materials for a corpus-based analysis on Indonesian morphology. This research shed light to a careful and thorough classification of the open-access databases of PE- and PEN- from their allomorphs, base word, semantics role, and morphological variation. The information provided in this article is hoped to be contributive in Indonesian morphology specifically, and other linguistics fields (e.g., corpus linguistics and quantitative linguistics) in general.  
印尼前缀PE和PEN数据库-
本文为建立与PE和PEN相关的数据库提供了理论基础,这两个印尼名词化前缀具有不同的含义(例如,患者、代理人或仪器)。第一个数据库包含PE和PEN的单词,而第二个数据库提供感兴趣的两个单词之间的余弦相似性。使用书面印尼语语料库作为主要来源(莱比锡语料库集),数据库包含以下信息:PE或PEN前缀、PEN变体、基本词、语义角色、形态变异、余弦相似性以及单词频率。此外,本文还阐述了如何培养每一种信息的理论思考。在数据库的构建过程中,使用了印尼语词形分析器和词到向量来分析印尼语的词形状态,并将语料库中的单词放入向量中。此外,还根据印度尼西亚综合词典对数据进行了人工核对。最后,这些数据库是免费的,因此这些数据可以作为基于语料库的印尼形态学分析的材料。本研究揭示了PE和PEN的开放访问数据库从其变体、基词、语义作用和形态变异等方面进行仔细而彻底的分类。本文所提供的信息有望对印尼形态学以及其他语言学领域(如语料库语言学和数量语言学)做出贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
40
审稿时长
12 weeks
期刊介绍: Information not localized
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信