Embedding from Language Models (ELMos)- based Dependency Parser for Indonesian Language

Q3 Computer Science
{"title":"Embedding from Language Models (ELMos)- based Dependency Parser for Indonesian Language","authors":"","doi":"10.15849/ijasca.211128.01","DOIUrl":null,"url":null,"abstract":"The goal of dependency parsing is to seek a functional relationship among words. For instance, it tells the subject-object relation in a sentence. Parsing the Indonesian language requires information about the morphology of a word. Indonesian grammar relies heavily on affixation to combine root words with affixes to form another word. Thus, morphology information should be incorporated. Fortunately, it can be encoded implicitly by word representation. Embeddings from Language Models (ELMo) is a word representation which be able to capture morphology information. Unlike most widely used word representations such as word2vec or Global Vectors (GloVe), ELMo utilizes a Convolutional Neural Network (CNN) over characters. With it, the affixation process could ideally encoded in a word representation. We did an analysis using nearest neighbor words and T-distributed Stochastic Neighbor Embedding (t-SNE) word visualization to compare word2vec and ELMo. Our result showed that ELMo representation is richer in encoding the morphology information than it's counterpart. We trained our parser using word2vec and ELMo. To no surprise, the parser which uses ELMo gets a higher accuracy than word2vec. We obtain Unlabeled Attachment Score (UAS) at 83.08 for ELMo and 81.35 for word2vec. Hence, we confirmed that morphology information is necessary, especially in a morphologically rich language like Indonesian. Keywords: ELMo, Dependency Parser, Natural Language Processing, word2vec","PeriodicalId":38638,"journal":{"name":"International Journal of Advances in Soft Computing and its Applications","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advances in Soft Computing and its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15849/ijasca.211128.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

The goal of dependency parsing is to seek a functional relationship among words. For instance, it tells the subject-object relation in a sentence. Parsing the Indonesian language requires information about the morphology of a word. Indonesian grammar relies heavily on affixation to combine root words with affixes to form another word. Thus, morphology information should be incorporated. Fortunately, it can be encoded implicitly by word representation. Embeddings from Language Models (ELMo) is a word representation which be able to capture morphology information. Unlike most widely used word representations such as word2vec or Global Vectors (GloVe), ELMo utilizes a Convolutional Neural Network (CNN) over characters. With it, the affixation process could ideally encoded in a word representation. We did an analysis using nearest neighbor words and T-distributed Stochastic Neighbor Embedding (t-SNE) word visualization to compare word2vec and ELMo. Our result showed that ELMo representation is richer in encoding the morphology information than it's counterpart. We trained our parser using word2vec and ELMo. To no surprise, the parser which uses ELMo gets a higher accuracy than word2vec. We obtain Unlabeled Attachment Score (UAS) at 83.08 for ELMo and 81.35 for word2vec. Hence, we confirmed that morphology information is necessary, especially in a morphologically rich language like Indonesian. Keywords: ELMo, Dependency Parser, Natural Language Processing, word2vec
基于嵌入语言模型(ELMos)的印尼语依赖分析器
依赖解析的目的是寻求单词之间的功能关系。例如,它告诉句子中的主体关系。解析印尼语需要掌握单词的形态信息。印尼语法在很大程度上依赖词缀来将词根词与词缀结合形成另一个词。因此,应纳入形态信息。幸运的是,它可以通过单词表示进行隐式编码。语言模型嵌入(ELMo)是一种能够捕获形态信息的单词表示。与最广泛使用的单词表示(如word2vec或全局向量(GloVe))不同,ELMo在字符上使用卷积神经网络(CNN)。有了它,词缀过程可以理想地编码在单词表示中。我们使用最近邻词和T分布随机相邻嵌入(T-SNE)词可视化进行了分析,以比较word2vec和ELMo。我们的结果表明,ELMo表示在编码形态学信息方面比它的对应表示更丰富。我们使用word2vec和ELMo来训练我们的解析器。毫不奇怪,使用ELMo的解析器获得了比word2vec更高的精度。ELMo和word2vec的无标签依恋得分分别为83.08和81.35。因此,我们确认了形态学信息是必要的,尤其是在像印尼语这样形态学丰富的语言中。关键词:ELMo,依赖分析器,自然语言处理,word2vec
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Advances in Soft Computing and its Applications
International Journal of Advances in Soft Computing and its Applications Computer Science-Computer Science Applications
CiteScore
3.30
自引率
0.00%
发文量
31
期刊介绍: The aim of this journal is to provide a lively forum for the communication of original research papers and timely review articles on Advances in Soft Computing and Its Applications. IJASCA will publish only articles of the highest quality. Submissions will be evaluated on their originality and significance. IJASCA invites submissions in all areas of Soft Computing and Its Applications. The scope of the journal includes, but is not limited to: √ Soft Computing Fundamental and Optimization √ Soft Computing for Big Data Era √ GPU Computing for Machine Learning √ Soft Computing Modeling for Perception and Spiritual Intelligence √ Soft Computing and Agents Technology √ Soft Computing in Computer Graphics √ Soft Computing and Pattern Recognition √ Soft Computing in Biomimetic Pattern Recognition √ Data mining for Social Network Data √ Spatial Data Mining & Information Retrieval √ Intelligent Software Agent Systems and Architectures √ Advanced Soft Computing and Multi-Objective Evolutionary Computation √ Perception-Based Intelligent Decision Systems √ Spiritual-Based Intelligent Systems √ Soft Computing in Industry ApplicationsOther issues related to the Advances of Soft Computing in various applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信