如何利用知识图谱增强汉语分词

2018 13th International Conference on Computer Science & Education (ICCSE) Pub Date : 2018-08-01 DOI:10.1109/ICCSE.2018.8468759

Kunhui Lin, Wenyuan Du, Xiaoli Wang, Meihong Wang, Zixiang Yang

{"title":"如何利用知识图谱增强汉语分词","authors":"Kunhui Lin, Wenyuan Du, Xiaoli Wang, Meihong Wang, Zixiang Yang","doi":"10.1109/ICCSE.2018.8468759","DOIUrl":null,"url":null,"abstract":"Chinese word segmentation is a very important problem for Chinese information processing. Chinese word segmentation results are the basis for computers to understand natural language. However, unlike most Western languages, Chinese words do not have fixed symbols like white space as word segmentation marks. Moreover, Chinese has a very complex grammar, and the word segmentation criteria are varied with the contexts. Therefore, Chinese word segmentation is a very difficult task. Many existing works have proposed many algorithms to solve this problem. However, to our best knowledge, none of them could outperform all the other methods. In this paper, we develop a novel algorithm based on semantics and contexts. We propose a semantic-based word similarity measure using the concept hierarchy in knowledge graphs, and use this measure to prune the different results which are generated by several state-of-the-art Chinese word segmentation methods. The idea is to respectively compute the concept similarity of these words to other words in the text, and choose the word with the highest concept similarity score. To evaluate the effectiveness of the proposed approach, we conduct a series of experiment on two real datasets. The results show that our method outperforms all the state-of-the-art algorithms by filtering out wrong results and retaining correct ones.","PeriodicalId":228760,"journal":{"name":"2018 13th International Conference on Computer Science & Education (ICCSE)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"How to Enhance Chinese Word Segmentation Using Knowledge Graphs\",\"authors\":\"Kunhui Lin, Wenyuan Du, Xiaoli Wang, Meihong Wang, Zixiang Yang\",\"doi\":\"10.1109/ICCSE.2018.8468759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chinese word segmentation is a very important problem for Chinese information processing. Chinese word segmentation results are the basis for computers to understand natural language. However, unlike most Western languages, Chinese words do not have fixed symbols like white space as word segmentation marks. Moreover, Chinese has a very complex grammar, and the word segmentation criteria are varied with the contexts. Therefore, Chinese word segmentation is a very difficult task. Many existing works have proposed many algorithms to solve this problem. However, to our best knowledge, none of them could outperform all the other methods. In this paper, we develop a novel algorithm based on semantics and contexts. We propose a semantic-based word similarity measure using the concept hierarchy in knowledge graphs, and use this measure to prune the different results which are generated by several state-of-the-art Chinese word segmentation methods. The idea is to respectively compute the concept similarity of these words to other words in the text, and choose the word with the highest concept similarity score. To evaluate the effectiveness of the proposed approach, we conduct a series of experiment on two real datasets. The results show that our method outperforms all the state-of-the-art algorithms by filtering out wrong results and retaining correct ones.\",\"PeriodicalId\":228760,\"journal\":{\"name\":\"2018 13th International Conference on Computer Science & Education (ICCSE)\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 13th International Conference on Computer Science & Education (ICCSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSE.2018.8468759\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 13th International Conference on Computer Science & Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2018.8468759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

中文分词是中文信息处理中的一个重要问题。中文分词结果是计算机理解自然语言的基础。然而，与大多数西方语言不同的是，汉语单词没有像空白这样的固定符号作为分词标记。此外，汉语的语法非常复杂，分词标准因语境而异。因此，汉语分词是一项非常困难的任务。许多现有的工作已经提出了许多算法来解决这个问题。然而，据我们所知，没有一种方法能胜过所有其他方法。在本文中，我们开发了一种基于语义和上下文的新算法。本文提出了一种基于语义的词相似度度量方法，并利用该度量方法对几种最先进的汉语分词方法产生的不同结果进行了修剪。其思路是分别计算这些单词与文本中其他单词的概念相似度，并选择概念相似度得分最高的单词。为了评估所提出方法的有效性，我们在两个真实数据集上进行了一系列实验。结果表明，该方法通过过滤错误结果并保留正确结果，优于所有最先进的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How to Enhance Chinese Word Segmentation Using Knowledge Graphs

Chinese word segmentation is a very important problem for Chinese information processing. Chinese word segmentation results are the basis for computers to understand natural language. However, unlike most Western languages, Chinese words do not have fixed symbols like white space as word segmentation marks. Moreover, Chinese has a very complex grammar, and the word segmentation criteria are varied with the contexts. Therefore, Chinese word segmentation is a very difficult task. Many existing works have proposed many algorithms to solve this problem. However, to our best knowledge, none of them could outperform all the other methods. In this paper, we develop a novel algorithm based on semantics and contexts. We propose a semantic-based word similarity measure using the concept hierarchy in knowledge graphs, and use this measure to prune the different results which are generated by several state-of-the-art Chinese word segmentation methods. The idea is to respectively compute the concept similarity of these words to other words in the text, and choose the word with the highest concept similarity score. To evaluate the effectiveness of the proposed approach, we conduct a series of experiment on two real datasets. The results show that our method outperforms all the state-of-the-art algorithms by filtering out wrong results and retaining correct ones.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 13th International Conference on Computer Science & Education (ICCSE)

自引率

0.00%

发文量