基于Char2Vec和bi - lstm的中国电信领域命名实体识别

2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE) Pub Date : 2017-11-01 DOI:10.1109/ISKE.2017.8258773

Yu Wang, Bin Xia, Zheng Liu, Yun Li, Tao Li

{"title":"基于Char2Vec和bi - lstm的中国电信领域命名实体识别","authors":"Yu Wang, Bin Xia, Zheng Liu, Yun Li, Tao Li","doi":"10.1109/ISKE.2017.8258773","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition (NER) is a basic task in Natural Language Processing (NLP), which extracts the meaningful named entities from the text. Compared with the English NER, the Chinese NER is more challenge, since there is no tense in the Chinese language. Moreover, the omissions and the Internet catchwords in the Chinese corpus make the NER task more difficult. Traditional machine learning methods (e.g., CRFs) cannot address the Chinese NER effectively because they are hard to learn the complicated context in the Chinese language. To overcome the aforementioned problem, we propose a deep learning model Char2Vec+Bi-LSTMs for Chinese NER. We use the Chinese character instead of the Chinese word as the embedding unit, and the Bi-LSTMs is used to learn the complicated semantic dependency. To evaluate our proposed model, we construct the corpus from the China TELECOM FAQs. Experimental results show that our model achieves better performance than other baseline methods and the character embedding is more appropriate than the word embedding in the Chinese language.","PeriodicalId":208009,"journal":{"name":"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Named entity recognition for Chinese telecommunications field based on Char2Vec and Bi-LSTMs\",\"authors\":\"Yu Wang, Bin Xia, Zheng Liu, Yun Li, Tao Li\",\"doi\":\"10.1109/ISKE.2017.8258773\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named Entity Recognition (NER) is a basic task in Natural Language Processing (NLP), which extracts the meaningful named entities from the text. Compared with the English NER, the Chinese NER is more challenge, since there is no tense in the Chinese language. Moreover, the omissions and the Internet catchwords in the Chinese corpus make the NER task more difficult. Traditional machine learning methods (e.g., CRFs) cannot address the Chinese NER effectively because they are hard to learn the complicated context in the Chinese language. To overcome the aforementioned problem, we propose a deep learning model Char2Vec+Bi-LSTMs for Chinese NER. We use the Chinese character instead of the Chinese word as the embedding unit, and the Bi-LSTMs is used to learn the complicated semantic dependency. To evaluate our proposed model, we construct the corpus from the China TELECOM FAQs. Experimental results show that our model achieves better performance than other baseline methods and the character embedding is more appropriate than the word embedding in the Chinese language.\",\"PeriodicalId\":208009,\"journal\":{\"name\":\"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISKE.2017.8258773\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE.2017.8258773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

命名实体识别(NER)是自然语言处理(NLP)中的一项基本任务，它从文本中提取有意义的命名实体。与英语的NER相比，汉语的NER更具挑战性，因为汉语中没有时态。此外，汉语语料库中的遗漏和网络流行语增加了NER任务的难度。传统的机器学习方法(如crf)无法有效地解决中文的NER问题，因为它们很难学习中文中复杂的上下文。为了克服上述问题，我们提出了一种中文NER深度学习模型Char2Vec+Bi-LSTMs。我们使用汉字代替中文单词作为嵌入单元，并使用bi - lstm学习复杂的语义依赖关系。为了评估我们提出的模型，我们从中国电信常见问题解答中构建了语料库。实验结果表明，我们的模型比其他基线方法取得了更好的性能，字符嵌入比词嵌入更适合中文。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Named entity recognition for Chinese telecommunications field based on Char2Vec and Bi-LSTMs

Named Entity Recognition (NER) is a basic task in Natural Language Processing (NLP), which extracts the meaningful named entities from the text. Compared with the English NER, the Chinese NER is more challenge, since there is no tense in the Chinese language. Moreover, the omissions and the Internet catchwords in the Chinese corpus make the NER task more difficult. Traditional machine learning methods (e.g., CRFs) cannot address the Chinese NER effectively because they are hard to learn the complicated context in the Chinese language. To overcome the aforementioned problem, we propose a deep learning model Char2Vec+Bi-LSTMs for Chinese NER. We use the Chinese character instead of the Chinese word as the embedding unit, and the Bi-LSTMs is used to learn the complicated semantic dependency. To evaluate our proposed model, we construct the corpus from the China TELECOM FAQs. Experimental results show that our model achieves better performance than other baseline methods and the character embedding is more appropriate than the word embedding in the Chinese language.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)

自引率

0.00%

发文量