基于相似度的汉语同义词搭配提取

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2005-03-01 DOI:10.30019/IJCLCLP.200503.0006

Wanyin Li, Q. Lu, Ruifeng Xu

{"title":"基于相似度的汉语同义词搭配提取","authors":"Wanyin Li, Q. Lu, Ruifeng Xu","doi":"10.30019/IJCLCLP.200503.0006","DOIUrl":null,"url":null,"abstract":"Collocation extraction systems based on pure statistical methods suffer from two major problems. The first problem is their relatively low precision and recall rates. The second problem is their difficulty in dealing with sparse collocations. In order to improve performance, both statistical and lexicographic approaches should be considered. This paper presents a new method to extract synonymous collocations using semantic information. The semantic information is obtained by calculating similarities from HowNet. We have successfully extracted synonymous collocations which normally cannot be extracted using lexical statistics. Our evaluation conducted on a 60MB tagged corpus shows that we can extract synonymous collocations that occur with very low frequency and that the improvement in the recall rate is close to 100%. In addition, compared with a collocation extraction system based on the Xtract system for English, our algorithm can improve the precision rate by about 44%.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Similarity Based Chinese Synonym Collocation Extraction\",\"authors\":\"Wanyin Li, Q. Lu, Ruifeng Xu\",\"doi\":\"10.30019/IJCLCLP.200503.0006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Collocation extraction systems based on pure statistical methods suffer from two major problems. The first problem is their relatively low precision and recall rates. The second problem is their difficulty in dealing with sparse collocations. In order to improve performance, both statistical and lexicographic approaches should be considered. This paper presents a new method to extract synonymous collocations using semantic information. The semantic information is obtained by calculating similarities from HowNet. We have successfully extracted synonymous collocations which normally cannot be extracted using lexical statistics. Our evaluation conducted on a 60MB tagged corpus shows that we can extract synonymous collocations that occur with very low frequency and that the improvement in the recall rate is close to 100%. In addition, compared with a collocation extraction system based on the Xtract system for English, our algorithm can improve the precision rate by about 44%.\",\"PeriodicalId\":436300,\"journal\":{\"name\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30019/IJCLCLP.200503.0006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Linguistics Chin. Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30019/IJCLCLP.200503.0006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

基于纯统计方法的搭配抽取系统存在两个主要问题。第一个问题是它们相对较低的准确率和召回率。第二个问题是它们难以处理稀疏搭配。为了提高性能，应该同时考虑统计和词典编纂方法。提出了一种利用语义信息提取同义搭配的新方法。通过计算HowNet上的相似度来获得语义信息。我们成功地提取了同义搭配，这通常是无法用词汇统计提取的。我们在一个60MB的标记语料库上进行的评估表明，我们可以提取频率很低的同义搭配，召回率的提高接近100%。此外，与基于Xtract系统的英语搭配提取系统相比，我们的算法可将准确率提高约44%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Similarity Based Chinese Synonym Collocation Extraction

Collocation extraction systems based on pure statistical methods suffer from two major problems. The first problem is their relatively low precision and recall rates. The second problem is their difficulty in dealing with sparse collocations. In order to improve performance, both statistical and lexicographic approaches should be considered. This paper presents a new method to extract synonymous collocations using semantic information. The semantic information is obtained by calculating similarities from HowNet. We have successfully extracted synonymous collocations which normally cannot be extracted using lexical statistics. Our evaluation conducted on a 60MB tagged corpus shows that we can extract synonymous collocations that occur with very low frequency and that the improvement in the recall rate is close to 100%. In addition, compared with a collocation extraction system based on the Xtract system for English, our algorithm can improve the precision rate by about 44%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Comput. Linguistics Chin. Lang. Process.

自引率

0.00%

发文量