Workshop on Chinese Language Processing最新文献

筛选
英文 中文
A Two-stage Statistical Word Segmentation System for Chinese 中文两阶段统计分词系统
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119273
G. Fu, K. Luke
{"title":"A Two-stage Statistical Word Segmentation System for Chinese","authors":"G. Fu, K. Luke","doi":"10.3115/1119250.1119273","DOIUrl":"https://doi.org/10.3115/1119250.1119273","url":null,"abstract":"In this paper we present a two-stage statistical word segmentation system for Chinese based on word bigram and word-formation models. This system was evaluated on Peking University corpora at the First International Chinese Word Segmentation Bakeoff. We also give results and discussions on this evaluation.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115470752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Learning Verb-Noun Relations to Improve Parsing 学习动名词关系提高句法分析能力
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119267
Andi Wu
{"title":"Learning Verb-Noun Relations to Improve Parsing","authors":"Andi Wu","doi":"10.3115/1119250.1119267","DOIUrl":"https://doi.org/10.3115/1119250.1119267","url":null,"abstract":"The verb-noun sequence in Chinese often creates ambiguities in parsing. These ambiguities can usually be resolved if we know in advance whether the verb and the noun tend to be in the verb-object relation or the modifier-head relation. In this paper, we describe a learning procedure whereby such knowledge can be automatically acquired. Using an existing (imperfect) parser with a chart filter and a tree filter, a large corpus, and the log-likelihood-ratio (LLR) algorithm, we were able to acquire verb-noun pairs which typically occur either in verb-object relations or modifier-head relations. The learned pairs are then used in the parsing process for disambiguation. Evaluation shows that the accuracy of the original parser improves significantly with the use of the automatically acquired knowledge.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121833363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures 基于内部度量和上下文度量混合的汉语两字词提取
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119254
Shengfen Luo, Maosong Sun
{"title":"Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures","authors":"Shengfen Luo, Maosong Sun","doi":"10.3115/1119250.1119254","DOIUrl":"https://doi.org/10.3115/1119250.1119254","url":null,"abstract":"Word extraction is one of the important tasks in text information processing. There are mainly two kinds of statistic-based measures for word extraction: the internal measure and the contextual measure. This paper discusses these two kinds of measures for Chinese word extraction. First, nine widely adopted internal measures are tested and compared on individual basis. Then various schemes of combining these measures are tried so as to improve the performance. Finally, the left/right entropy is integrated to see the effect of contextual measures. Genetic algorithm is explored to automatically adjust the weights of combination and thresholds. Experiments focusing on two-character Chinese word extraction show a promising result: the F-measure of mutual information, the most powerful internal measure, is 57.82%, whereas the best combination scheme of internal measures achieves the F-measure of 59.87%. With the integration of the contextual measure, the word extraction achieves the F-measure of 68.48% at last.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122880465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Class Based Sense Definition Model for Word Sense Tagging and Disambiguation 基于类的词义定义模型用于词义标注和消歧
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119252
Tracy Lin, Jason J. S. Chang
{"title":"Class Based Sense Definition Model for Word Sense Tagging and Disambiguation","authors":"Tracy Lin, Jason J. S. Chang","doi":"10.3115/1119250.1119252","DOIUrl":"https://doi.org/10.3115/1119250.1119252","url":null,"abstract":"We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense ambiguity for words in a parallel corpus. That sense tagging procedure, in effect, produces a semantic bilingual concordance, which can be used to train WSD systems for the two languages involved. Experimental results show that CBSDM trained on Longman Dictionary of Contemporary English, English-Chinese Edition (LDOCE E-C) and Longman Lexicon of Contemporary English (LLOCE) is very effectively in turning a Chinese-English parallel corpus into sense tagged data for development of WSD systems.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124618811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling of Long Distance Context Dependency in Chinese 汉语长距离语境依赖的建模
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119260
Guodong Zhou
{"title":"Modeling of Long Distance Context Dependency in Chinese","authors":"Guodong Zhou","doi":"10.3115/1119250.1119260","DOIUrl":"https://doi.org/10.3115/1119250.1119260","url":null,"abstract":"Ngram modeling is simple in language modeling and has been widely used in many applications. However, it can only capture the short distance context dependency within an N-word window where the largest practical N for natural language is three. In the meantime, much of context dependency in natural language occurs beyond a three-word window. In order to incorporate this kind of long distance context dependency, this paper proposes a new MI-Ngram modeling approach. The MI-Ngram model consists of two components: an ngram model and an MI model. The ngram model captures the short distance context dependency within an N-word window while the MI model captures the long distance context dependency between the word pairs beyond the N-word window by using the concept of mutual information. It is found that MI-Ngram modeling has much better performance than ngram modeling. Evaluation on the XINHUA new corpus of 29 million words shows that inclusion of the best 1,600,000 word pairs decreases the perplexity of the MI-Trigram model by 20 percent compared with the trigram model. In the meanwhile, evaluation on Chinese word segmentation shows that about 35 percent of errors can be corrected by using the MI-Trigram model compared with the trigram model.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126285722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Effect of Rhythm on Structural Disambiguation in Chinese 节奏对汉语结构消歧的影响
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119256
H. Sun, Dan Jurafsky
{"title":"The Effect of Rhythm on Structural Disambiguation in Chinese","authors":"H. Sun, Dan Jurafsky","doi":"10.3115/1119250.1119256","DOIUrl":"https://doi.org/10.3115/1119250.1119256","url":null,"abstract":"The length of a constituent (number of syllables in a word or number of words in a phrase), or rhythm, plays an important role in Chinese syntax. This paper systematically surveys the distribution of rhythm in constructions in Chinese from the statistical data acquired from a shallow tree bank. Based on our survey, we then used the rhythm feature in a practical shallow parsing task by using rhythm as a statistical feature to augment a PCFG model. Our results show that using the probabilistic rhythm feature significantly improves the performance of our shallow parser.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Abductive Explanation-based Learning Improves Parsing Accuracy and Efficiency 基于溯因解释的学习提高了解析的准确性和效率
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119265
O. Streiter
{"title":"Abductive Explanation-based Learning Improves Parsing Accuracy and Efficiency","authors":"O. Streiter","doi":"10.3115/1119250.1119265","DOIUrl":"https://doi.org/10.3115/1119250.1119265","url":null,"abstract":"Natural language parsing has to be accurate and quick. Explanation-based Learning (EBL) is a technique to speed-up parsing. The accuracy however often declines with EBL. The paper shows that this accuracy loss is not due to the EBL framework as such, but to deductive parsing. Abductive EBL allows extending the deductive closure of the parser. We present a Chinese parser based on abduction. Experiments show improvements in accuracy and efficiency.1","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129391018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CHINERS: A Chinese Named Entity Recognition System for the Sports Domain CHINERS:面向体育领域的中文命名实体识别系统
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119258
Tianfang Yao, Wei Ding, G. Erbach
{"title":"CHINERS: A Chinese Named Entity Recognition System for the Sports Domain","authors":"Tianfang Yao, Wei Ding, G. Erbach","doi":"10.3115/1119250.1119258","DOIUrl":"https://doi.org/10.3115/1119250.1119258","url":null,"abstract":"In the investigation for Chinese named entity (NE) recognition, we are confronted with two principal challenges. One is how to ensure the quality of word segmentation and Part-of-Speech (POS) tagging, because its consequence has an adverse impact on the performance of NE recognition. Another is how to flexibly, reliably and accurately recognize NEs. In order to cope with the challenges, we propose a system architecture which is divided into two phases. In the first phase, we should reduce word segmentation and POS tagging errors leading to the second phase as much as possible. For this purpose, we utilize machine learning techniques to repair such errors. In the second phase, we design Finite State Cascades (FSC) which can be automatically constructed depending on the recognition rule sets as a shallow parser for the recognition of NEs. The advantages of that are reliable, accurate and easy to do maintenance for FSC. Additionally, to recognize special NEs, we work out the corresponding strategies to enhance the correctness of the recognition. The experimental evaluation of the system has shown that the total average recall and precision for six types of NEs are 83% and 85% respectively. Therefore, the system architecture is reasonable and effective.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"386 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130210268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
The First International Chinese Word Segmentation Bakeoff 首届国际汉语分词大赛
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119269
R. Sproat, Thomas Emerson
{"title":"The First International Chinese Word Segmentation Bakeoff","authors":"R. Sproat, Thomas Emerson","doi":"10.3115/1119250.1119269","DOIUrl":"https://doi.org/10.3115/1119250.1119269","url":null,"abstract":"This paper presents the results from the ACL-SIGHAN-sponsored First International Chinese Word Segmentation Bakeoff held in 2003 and reported in conjunction with the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan. We give the motivation for having an international segmentation contest (given that there have been two within-China contests to date) and we report on the results of this first international contest, analyze these results, and make some recommendations for the future.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131343275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 236
Chinese Word Segmentation as LMR Tagging 基于LMR标注的汉语分词
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119278
Nianwen Xue, Libin Shen
{"title":"Chinese Word Segmentation as LMR Tagging","authors":"Nianwen Xue, Libin Shen","doi":"10.3115/1119250.1119278","DOIUrl":"https://doi.org/10.3115/1119250.1119278","url":null,"abstract":"In this paper we present Chinese word segmentation algorithms based on the so-called LMR tagging. Our LMR taggers are implemented with the Maximum Entropy Markov Model and we then use Transformation-Based Learning to combine the results of the two LMR taggers that scan the input in opposite directions. Our system achieves F-scores of 95.9% and 91.6% on the Academia Sinica corpus and the Hong Kong City University corpus respectively.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121767396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 151
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信