{"title":"缅甸茎秆神经序列标记优化器与Dropout的比较","authors":"Oo Yadanar, K. Soe","doi":"10.1109/ICIAICT.2019.8784850","DOIUrl":null,"url":null,"abstract":"In Myanmar language, texts typically contain many different forms of a basic word. Morphological variants are generally the most common problem in mis-spellings, wrong translation and irrelevant retrieval query. The effectiveness of searching is obviously related to the stemming process. Moreover, there is no space separation in Myanmar language. Therefore, the tasks of segmenting the initial texts to words sequence is fully related to the stemming process. In present-day, deep learning approaches have become very good performance in variety of tasks, such as natural language processing, speech recognition, image recognizing. Among different types of neural networks, CNN networks have been most extensively used in text processing to extract morphological information (prefix and suffix of a word). This paper proposes the optimization process in Neural Architecture, how loss functions fit into the equation and finding the best optimizer. This paper also classified the efficiency of dropout under each optimizer to improve CNN-based model which jointly learns stemming and segmentation boundaries in parallel. It has obtained significant improvements on model performance after using dropout and the highest F-score is dropout probability 0.2. According to the experimental results, the SGD and Adam optimizer have a vast effect on the performance. And then, RMSProp optimizer performs better than other optimizers even though there is less dropout nodes.","PeriodicalId":277919,"journal":{"name":"2019 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Optimizer Comparison with Dropout for Neural Sequence Labeling in Myanmar Stemmer\",\"authors\":\"Oo Yadanar, K. Soe\",\"doi\":\"10.1109/ICIAICT.2019.8784850\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Myanmar language, texts typically contain many different forms of a basic word. Morphological variants are generally the most common problem in mis-spellings, wrong translation and irrelevant retrieval query. The effectiveness of searching is obviously related to the stemming process. Moreover, there is no space separation in Myanmar language. Therefore, the tasks of segmenting the initial texts to words sequence is fully related to the stemming process. In present-day, deep learning approaches have become very good performance in variety of tasks, such as natural language processing, speech recognition, image recognizing. Among different types of neural networks, CNN networks have been most extensively used in text processing to extract morphological information (prefix and suffix of a word). This paper proposes the optimization process in Neural Architecture, how loss functions fit into the equation and finding the best optimizer. This paper also classified the efficiency of dropout under each optimizer to improve CNN-based model which jointly learns stemming and segmentation boundaries in parallel. It has obtained significant improvements on model performance after using dropout and the highest F-score is dropout probability 0.2. According to the experimental results, the SGD and Adam optimizer have a vast effect on the performance. And then, RMSProp optimizer performs better than other optimizers even though there is less dropout nodes.\",\"PeriodicalId\":277919,\"journal\":{\"name\":\"2019 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIAICT.2019.8784850\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIAICT.2019.8784850","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
在缅甸语中,文本通常包含一个基本单词的许多不同形式。词形变体是词汇拼写错误、翻译错误和检索查询不相关的常见问题。搜索的有效性明显与词干提取过程有关。此外,缅甸语没有空格分隔。因此,将初始文本分割成单词序列的任务与词干提取过程完全相关。目前,深度学习方法在自然语言处理、语音识别、图像识别等各种任务中都有很好的表现。在不同类型的神经网络中,CNN网络在文本处理中应用最为广泛,用于提取词形信息(单词的前缀和后缀)。本文提出了神经结构中的优化过程、损失函数如何拟合到方程中以及如何找到最佳优化器。本文还对每个优化器下的dropout效率进行了分类,以改进并行学习词干提取和分割边界的基于cnn的模型。使用dropout后,模型性能得到了显著的改善,最高f值为dropout probability 0.2。实验结果表明,SGD和Adam优化器对性能有很大的影响。然后,RMSProp优化器比其他优化器性能更好,即使有更少的退出节点。
Optimizer Comparison with Dropout for Neural Sequence Labeling in Myanmar Stemmer
In Myanmar language, texts typically contain many different forms of a basic word. Morphological variants are generally the most common problem in mis-spellings, wrong translation and irrelevant retrieval query. The effectiveness of searching is obviously related to the stemming process. Moreover, there is no space separation in Myanmar language. Therefore, the tasks of segmenting the initial texts to words sequence is fully related to the stemming process. In present-day, deep learning approaches have become very good performance in variety of tasks, such as natural language processing, speech recognition, image recognizing. Among different types of neural networks, CNN networks have been most extensively used in text processing to extract morphological information (prefix and suffix of a word). This paper proposes the optimization process in Neural Architecture, how loss functions fit into the equation and finding the best optimizer. This paper also classified the efficiency of dropout under each optimizer to improve CNN-based model which jointly learns stemming and segmentation boundaries in parallel. It has obtained significant improvements on model performance after using dropout and the highest F-score is dropout probability 0.2. According to the experimental results, the SGD and Adam optimizer have a vast effect on the performance. And then, RMSProp optimizer performs better than other optimizers even though there is less dropout nodes.