基于unicode的自适应分段器

Journal of Chinese Language and Computing Pub Date : 2003-07-11 DOI:10.3115/1119250.1119275

Q. Lu, Shiu-tong Chan, Baoli Li, Shiwen Yu

{"title":"基于unicode的自适应分段器","authors":"Q. Lu, Shiu-tong Chan, Baoli Li, Shiwen Yu","doi":"10.3115/1119250.1119275","DOIUrl":null,"url":null,"abstract":"This paper presents a Unicode based Chinese word segmentor. It can handle Chinese text in Simplified, Traditional, or mixed mode. The system uses the strategy of divide-and-conquer to handle the recognition of personal names, numbers, time and numerical values, etc in the preprocessing stage. The segmentor further uses tagging information to work on disambiguation. Adopting a modular design approach, different functional parts are separately implemented using different modules and each module tackles one problem at a time providing more flexibility and extensibility. Results show that with added pre-processing modules and accessorial modules, the accuracy of the segmentor is increased and the system is easily adaptive to different applications.","PeriodicalId":130780,"journal":{"name":"Journal of Chinese Language and Computing","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"A Unicode-based Adaptive Segmenter\",\"authors\":\"Q. Lu, Shiu-tong Chan, Baoli Li, Shiwen Yu\",\"doi\":\"10.3115/1119250.1119275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a Unicode based Chinese word segmentor. It can handle Chinese text in Simplified, Traditional, or mixed mode. The system uses the strategy of divide-and-conquer to handle the recognition of personal names, numbers, time and numerical values, etc in the preprocessing stage. The segmentor further uses tagging information to work on disambiguation. Adopting a modular design approach, different functional parts are separately implemented using different modules and each module tackles one problem at a time providing more flexibility and extensibility. Results show that with added pre-processing modules and accessorial modules, the accuracy of the segmentor is increased and the system is easily adaptive to different applications.\",\"PeriodicalId\":130780,\"journal\":{\"name\":\"Journal of Chinese Language and Computing\",\"volume\":\"134 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chinese Language and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3115/1119250.1119275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chinese Language and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1119250.1119275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

提出了一种基于Unicode的汉语分词器。它可以处理简体、繁体或混合模式的中文文本。该系统在预处理阶段采用分治策略对人名、数字、时间、数值等进行识别。分词器进一步使用标记信息来消除歧义。采用模块化设计方法，使用不同的模块分别实现不同的功能部分，每个模块一次处理一个问题，从而提供更多的灵活性和可扩展性。结果表明，通过增加预处理模块和辅助模块，可以提高分割器的精度，使系统易于适应不同的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Unicode-based Adaptive Segmenter

This paper presents a Unicode based Chinese word segmentor. It can handle Chinese text in Simplified, Traditional, or mixed mode. The system uses the strategy of divide-and-conquer to handle the recognition of personal names, numbers, time and numerical values, etc in the preprocessing stage. The segmentor further uses tagging information to work on disambiguation. Adopting a modular design approach, different functional parts are separately implemented using different modules and each module tackles one problem at a time providing more flexibility and extensibility. Results show that with added pre-processing modules and accessorial modules, the accuracy of the segmentor is increased and the system is easily adaptive to different applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Chinese Language and Computing

自引率

0.00%

发文量