{"title":"Better Pretrained Embedding with Convolutional Neural Networks for Morphological Stemming","authors":"Y. Oo, K. Soe","doi":"10.1145/3348488.3348499","DOIUrl":null,"url":null,"abstract":"Words are considered as independent entities without any direct relationship among morphologically related word. So, some rare words are poorly estimated and unknown words are represented only a few vectors. The process of stemming is to reduce different forms to a common morphological root. Word embedding is a good generalization to unseen words and that can capture general syntactic as well as semantic properties of word. Furthermore, deep learning approaches have become more and more prominent in NLP tasks and pre-trained embedding layers have been applied to improve the performance of neural network architectures for many NLP applications. However, word segmentation for Myanmar Language, like for most Asian Languages, is a vital task and widely-studied sequence labeling problem. Normally, stemming is considered as a separate process from segmentation. In this paper, new approach indicates segmentation boundaries when it performs stemming. This paper proposes several word representations from character and syllable level and they are corporate in convolutional neural network (CNN-based model) which jointly learns stemming and segmentation boundaries in parallel. It is also evaluated the performance of convolutional neural network that relies on different pre-trained embedding. According to the experimental results, the pre-trained embedding has a vast effect on the performance.","PeriodicalId":420290,"journal":{"name":"International Conference on Artificial Intelligence and Virtual Reality","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence and Virtual Reality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3348488.3348499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Words are considered as independent entities without any direct relationship among morphologically related word. So, some rare words are poorly estimated and unknown words are represented only a few vectors. The process of stemming is to reduce different forms to a common morphological root. Word embedding is a good generalization to unseen words and that can capture general syntactic as well as semantic properties of word. Furthermore, deep learning approaches have become more and more prominent in NLP tasks and pre-trained embedding layers have been applied to improve the performance of neural network architectures for many NLP applications. However, word segmentation for Myanmar Language, like for most Asian Languages, is a vital task and widely-studied sequence labeling problem. Normally, stemming is considered as a separate process from segmentation. In this paper, new approach indicates segmentation boundaries when it performs stemming. This paper proposes several word representations from character and syllable level and they are corporate in convolutional neural network (CNN-based model) which jointly learns stemming and segmentation boundaries in parallel. It is also evaluated the performance of convolutional neural network that relies on different pre-trained embedding. According to the experimental results, the pre-trained embedding has a vast effect on the performance.