结合Ngram模型和案例学习的中文分词方法

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI:10.3115/1119250.1119274

C. Kit, Zhiming Xu, J. Webster

{"title":"结合Ngram模型和案例学习的中文分词方法","authors":"C. Kit, Zhiming Xu, J. Webster","doi":"10.3115/1119250.1119274","DOIUrl":null,"url":null,"abstract":"This paper presents our recent work for participation in the First International Chinese Word Segmentation Bake-off (ICWSB-1). It is based on a general-purpose ngram model for word segmentation and a case-based learning approach to disambiguation. This system excels in identifying in-vocabulary (IV) words, achieving a recall of around 96-98%. Here we present our strategies for language model training and disambiguation rule learning, analyze the system's performance, and discuss areas for further improvement, e.g., out-of-vocabulary (OOV) word discovery.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Integrating Ngram Model and Case-based Learning for Chinese Word Segmentation\",\"authors\":\"C. Kit, Zhiming Xu, J. Webster\",\"doi\":\"10.3115/1119250.1119274\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents our recent work for participation in the First International Chinese Word Segmentation Bake-off (ICWSB-1). It is based on a general-purpose ngram model for word segmentation and a case-based learning approach to disambiguation. This system excels in identifying in-vocabulary (IV) words, achieving a recall of around 96-98%. Here we present our strategies for language model training and disambiguation rule learning, analyze the system's performance, and discuss areas for further improvement, e.g., out-of-vocabulary (OOV) word discovery.\",\"PeriodicalId\":403123,\"journal\":{\"name\":\"Workshop on Chinese Language Processing\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Chinese Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3115/1119250.1119274\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Chinese Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1119250.1119274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

本文介绍了我们最近参加第一届国际汉语分词大赛(ICWSB-1)的工作。它基于通用的分词模型和基于案例的消歧学习方法。该系统在识别词汇中(IV)单词方面表现出色，召回率约为96% -98%。在这里，我们提出了语言模型训练和消歧规则学习的策略，分析了系统的性能，并讨论了进一步改进的领域，例如，词汇外(OOV)单词发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Integrating Ngram Model and Case-based Learning for Chinese Word Segmentation

This paper presents our recent work for participation in the First International Chinese Word Segmentation Bake-off (ICWSB-1). It is based on a general-purpose ngram model for word segmentation and a case-based learning approach to disambiguation. This system excels in identifying in-vocabulary (IV) words, achieving a recall of around 96-98%. Here we present our strategies for language model training and disambiguation rule learning, analyze the system's performance, and discuss areas for further improvement, e.g., out-of-vocabulary (OOV) word discovery.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Chinese Language Processing

自引率

0.00%

发文量