{"title":"用长短期记忆网络解决词法词汇不足问题","authors":"Yun Tang, Chuanxiang Tang, Caixin Zhu","doi":"10.1109/ICAICA50127.2020.9182586","DOIUrl":null,"url":null,"abstract":"Out of vocabulary (OOV), which is a word that does not exist in a predefined vocabulary. How to deal with OOV is an important research topic in the field of natural language processing. The existence of OOV directly affects the performance of many NLP systems. For example, in some common scenarios such as machine translation, sentiment analysis, and intelligent question answering, the existence of OOV can greatly affect the key performance of the system. In recent years, with the advent of the word vector algorithm word2vec based on the principle of word morphology, the word embedding path of the NLP system has improved significantly. We combine LSTM with NLM, taking the morphemes of words as the basic processing unit, while taking into account the global context information. The results obtained are better than the existing OOV processing strategies, and the performance of commonly used NLP systems is generally improved. Finally, it is experimentally proved that our model is generally better than the existing models in the problem of unregistered word processing.","PeriodicalId":113564,"journal":{"name":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Resolve out of Vocabulary with Long Short-Term Memory Networks for Morphology\",\"authors\":\"Yun Tang, Chuanxiang Tang, Caixin Zhu\",\"doi\":\"10.1109/ICAICA50127.2020.9182586\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Out of vocabulary (OOV), which is a word that does not exist in a predefined vocabulary. How to deal with OOV is an important research topic in the field of natural language processing. The existence of OOV directly affects the performance of many NLP systems. For example, in some common scenarios such as machine translation, sentiment analysis, and intelligent question answering, the existence of OOV can greatly affect the key performance of the system. In recent years, with the advent of the word vector algorithm word2vec based on the principle of word morphology, the word embedding path of the NLP system has improved significantly. We combine LSTM with NLM, taking the morphemes of words as the basic processing unit, while taking into account the global context information. The results obtained are better than the existing OOV processing strategies, and the performance of commonly used NLP systems is generally improved. Finally, it is experimentally proved that our model is generally better than the existing models in the problem of unregistered word processing.\",\"PeriodicalId\":113564,\"journal\":{\"name\":\"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICA50127.2020.9182586\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA50127.2020.9182586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Resolve out of Vocabulary with Long Short-Term Memory Networks for Morphology
Out of vocabulary (OOV), which is a word that does not exist in a predefined vocabulary. How to deal with OOV is an important research topic in the field of natural language processing. The existence of OOV directly affects the performance of many NLP systems. For example, in some common scenarios such as machine translation, sentiment analysis, and intelligent question answering, the existence of OOV can greatly affect the key performance of the system. In recent years, with the advent of the word vector algorithm word2vec based on the principle of word morphology, the word embedding path of the NLP system has improved significantly. We combine LSTM with NLM, taking the morphemes of words as the basic processing unit, while taking into account the global context information. The results obtained are better than the existing OOV processing strategies, and the performance of commonly used NLP systems is generally improved. Finally, it is experimentally proved that our model is generally better than the existing models in the problem of unregistered word processing.