Resolve out of Vocabulary with Long Short-Term Memory Networks for Morphology

Yun Tang, Chuanxiang Tang, Caixin Zhu
{"title":"Resolve out of Vocabulary with Long Short-Term Memory Networks for Morphology","authors":"Yun Tang, Chuanxiang Tang, Caixin Zhu","doi":"10.1109/ICAICA50127.2020.9182586","DOIUrl":null,"url":null,"abstract":"Out of vocabulary (OOV), which is a word that does not exist in a predefined vocabulary. How to deal with OOV is an important research topic in the field of natural language processing. The existence of OOV directly affects the performance of many NLP systems. For example, in some common scenarios such as machine translation, sentiment analysis, and intelligent question answering, the existence of OOV can greatly affect the key performance of the system. In recent years, with the advent of the word vector algorithm word2vec based on the principle of word morphology, the word embedding path of the NLP system has improved significantly. We combine LSTM with NLM, taking the morphemes of words as the basic processing unit, while taking into account the global context information. The results obtained are better than the existing OOV processing strategies, and the performance of commonly used NLP systems is generally improved. Finally, it is experimentally proved that our model is generally better than the existing models in the problem of unregistered word processing.","PeriodicalId":113564,"journal":{"name":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA50127.2020.9182586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Out of vocabulary (OOV), which is a word that does not exist in a predefined vocabulary. How to deal with OOV is an important research topic in the field of natural language processing. The existence of OOV directly affects the performance of many NLP systems. For example, in some common scenarios such as machine translation, sentiment analysis, and intelligent question answering, the existence of OOV can greatly affect the key performance of the system. In recent years, with the advent of the word vector algorithm word2vec based on the principle of word morphology, the word embedding path of the NLP system has improved significantly. We combine LSTM with NLM, taking the morphemes of words as the basic processing unit, while taking into account the global context information. The results obtained are better than the existing OOV processing strategies, and the performance of commonly used NLP systems is generally improved. Finally, it is experimentally proved that our model is generally better than the existing models in the problem of unregistered word processing.
用长短期记忆网络解决词法词汇不足问题
超出词汇表(OOV),即预定义词汇表中不存在的单词。如何处理面向对象语义是自然语言处理领域的一个重要研究课题。面向对象的存在直接影响到许多自然语言处理系统的性能。例如,在一些常见的场景中,如机器翻译、情感分析和智能问答,OOV的存在会极大地影响系统的关键性能。近年来,随着基于词法原理的词向量算法word2vec的出现,自然语言处理系统的词嵌入路径有了明显改善。我们将LSTM与NLM相结合,以词素为基本处理单元,同时考虑全局语境信息。所获得的结果优于现有的面向对象处理策略,并且普遍提高了常用NLP系统的性能。最后,实验证明,在未注册字处理问题上,我们的模型总体上优于现有的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信