Workshop on Chinese Language Processing最新文献

A Two-stage Statistical Word Segmentation System for Chinese 中文两阶段统计分词系统

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119273

G. Fu, K. Luke

引用次数: 19

Learning Verb-Noun Relations to Improve Parsing 学习动名词关系提高句法分析能力

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119267

Andi Wu

引用次数: 10

Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures 基于内部度量和上下文度量混合的汉语两字词提取

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119254

Shengfen Luo, Maosong Sun

引用次数: 37

Class Based Sense Definition Model for Word Sense Tagging and Disambiguation 基于类的词义定义模型用于词义标注和消歧

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119252

Tracy Lin, Jason J. S. Chang

引用次数: 0

Modeling of Long Distance Context Dependency in Chinese 汉语长距离语境依赖的建模

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119260

Guodong Zhou

{"title":"Modeling of Long Distance Context Dependency in Chinese","authors":"Guodong Zhou","doi":"10.3115/1119250.1119260","DOIUrl":"https://doi.org/10.3115/1119250.1119260","url":null,"abstract":"Ngram modeling is simple in language modeling and has been widely used in many applications. However, it can only capture the short distance context dependency within an N-word window where the largest practical N for natural language is three. In the meantime, much of context dependency in natural language occurs beyond a three-word window. In order to incorporate this kind of long distance context dependency, this paper proposes a new MI-Ngram modeling approach. The MI-Ngram model consists of two components: an ngram model and an MI model. The ngram model captures the short distance context dependency within an N-word window while the MI model captures the long distance context dependency between the word pairs beyond the N-word window by using the concept of mutual information. It is found that MI-Ngram modeling has much better performance than ngram modeling. Evaluation on the XINHUA new corpus of 29 million words shows that inclusion of the best 1,600,000 word pairs decreases the perplexity of the MI-Trigram model by 20 percent compared with the trigram model. In the meanwhile, evaluation on Chinese word segmentation shows that about 35 percent of errors can be corrected by using the MI-Trigram model compared with the trigram model.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126285722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Effect of Rhythm on Structural Disambiguation in Chinese 节奏对汉语结构消歧的影响

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119256

H. Sun, Dan Jurafsky

引用次数: 11

Abductive Explanation-based Learning Improves Parsing Accuracy and Efficiency 基于溯因解释的学习提高了解析的准确性和效率

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119265

O. Streiter

引用次数: 0

CHINERS: A Chinese Named Entity Recognition System for the Sports Domain CHINERS:面向体育领域的中文命名实体识别系统

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119258

Tianfang Yao, Wei Ding, G. Erbach

{"title":"CHINERS: A Chinese Named Entity Recognition System for the Sports Domain","authors":"Tianfang Yao, Wei Ding, G. Erbach","doi":"10.3115/1119250.1119258","DOIUrl":"https://doi.org/10.3115/1119250.1119258","url":null,"abstract":"In the investigation for Chinese named entity (NE) recognition, we are confronted with two principal challenges. One is how to ensure the quality of word segmentation and Part-of-Speech (POS) tagging, because its consequence has an adverse impact on the performance of NE recognition. Another is how to flexibly, reliably and accurately recognize NEs. In order to cope with the challenges, we propose a system architecture which is divided into two phases. In the first phase, we should reduce word segmentation and POS tagging errors leading to the second phase as much as possible. For this purpose, we utilize machine learning techniques to repair such errors. In the second phase, we design Finite State Cascades (FSC) which can be automatically constructed depending on the recognition rule sets as a shallow parser for the recognition of NEs. The advantages of that are reliable, accurate and easy to do maintenance for FSC. Additionally, to recognize special NEs, we work out the corresponding strategies to enhance the correctness of the recognition. The experimental evaluation of the system has shown that the total average recall and precision for six types of NEs are 83% and 85% respectively. Therefore, the system architecture is reasonable and effective.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"386 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130210268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

The First International Chinese Word Segmentation Bakeoff 首届国际汉语分词大赛

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119269

R. Sproat, Thomas Emerson

引用次数: 236

Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff 第一届国际汉语分词大赛CKIP中文分词系统介绍

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119276

Wei-Yun Ma, Keh-Jiann Chen

引用次数: 172