Workshop on Chinese Language Processing最新文献

筛选
英文 中文
Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff 第一届国际汉语分词大赛CKIP中文分词系统介绍
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119276
Wei-Yun Ma, Keh-Jiann Chen
{"title":"Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff","authors":"Wei-Yun Ma, Keh-Jiann Chen","doi":"10.3115/1119250.1119276","DOIUrl":"https://doi.org/10.3115/1119250.1119276","url":null,"abstract":"In this paper, we roughly described the procedures of our segmentation system, including the methods for resolving segmentation ambiguities and identifying unknown words. The CKIP group of Academia Sinica participated in testing on open and closed tracks of Beijing University (PK) and Hong Kong Cityu (HK). The evaluation results show our system performs very well in either HK open track or HK closed track and just acceptable in PK tracks. Some explanations and analysis are presented in this paper.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128228860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 172
News-Oriented Automatic Chinese Keyword Indexing 面向新闻的中文关键词自动索引
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119263
Sujian Li, Houfeng Wang, Shiwen Yu, Chengsheng Xin
{"title":"News-Oriented Automatic Chinese Keyword Indexing","authors":"Sujian Li, Houfeng Wang, Shiwen Yu, Chengsheng Xin","doi":"10.3115/1119250.1119263","DOIUrl":"https://doi.org/10.3115/1119250.1119263","url":null,"abstract":"In our information era, keywords are very useful to information retrieval, text clustering and so on. News is always a domain attracting a large amount of attention. However, the majority of news articles come without keywords, and indexing them manually costs highly. Aiming at news articles' characteristics and the resources available, this paper introduces a simple procedure to index keywords based on the scoring system. In the process of indexing, we make use of some relatively mature linguistic techniques and tools to filter those meaningless candidate items. Furthermore, according to the hierarchical relations of content words, keywords are not restricted to extracting from text. These methods have improved our system a lot. At last experimental results are given and analyzed, showing that the quality of extracted keywords are satisfying.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128539471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Single Character Chinese Named Entity Recognition 单字中文命名实体识别
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119268
Xiao-Dan Zhu, Mu Li, Jianfeng Gao, C. Huang
{"title":"Single Character Chinese Named Entity Recognition","authors":"Xiao-Dan Zhu, Mu Li, Jianfeng Gao, C. Huang","doi":"10.3115/1119250.1119268","DOIUrl":"https://doi.org/10.3115/1119250.1119268","url":null,"abstract":"Single character named entity (SCNE) is a name entity (NE) composed of one Chinese character, such as \"[Abstract contained text which could not be captured.]\" (zhong1, China) and \"[Abstract contained text which could not be captured.]\" (e2, Russia). SCNE is very common in written Chinese text. However, due to the lack of in-depth research, SCNE is a major source of errors in named entity recognition (NER). This paper formulates the SCNE recognition within the source-channel model framework. Our experiments show very encouraging results: an F-score of 81.01% for single character location name recognition, and an F-score of 68.02% for single character person name recognition. An alternative view of the SCNE recognition problem is to formulate it as a classification task. We construct two classifiers based on maximum entropy model (ME) and vector space model (VSM), respectively. We compare all proposed approaches, showing that the source-channel model performs the best in most cases.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126950663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
SYSTRAN's Chinese Word Segmentation systeman的中文分词
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119279
Jin Yang, Jean Senellart, R. Zajac
{"title":"SYSTRAN's Chinese Word Segmentation","authors":"Jin Yang, Jean Senellart, R. Zajac","doi":"10.3115/1119250.1119279","DOIUrl":"https://doi.org/10.3115/1119250.1119279","url":null,"abstract":"SYSTRAN's Chinese word segmentation is one important component of its Chinese-English machine translation system. The Chinese word segmentation module uses a rule-based approach, based on a large dictionary and fine-grained linguistic rules. It works on general-purpose texts from different Chinese-speaking regions, with comparable performance. SYSTRAN participated in the four open tracks in the First International Chinese Word Segmentation Bakeoff. This paper gives a general description of the segmentation module, as well as the results and analysis of its performance in the Bakeoff.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121820773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
HHMM-based Chinese Lexical Analyzer ICTCLAS 基于hmm的汉语词法分析器ICTCLAS
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119280
Huaping Zhang, Hongkui Yu, Deyi Xiong, Qun Liu
{"title":"HHMM-based Chinese Lexical Analyzer ICTCLAS","authors":"Huaping Zhang, Hongkui Yu, Deyi Xiong, Qun Liu","doi":"10.3115/1119250.1119280","DOIUrl":"https://doi.org/10.3115/1119250.1119280","url":null,"abstract":"This document presents the results from Inst. of Computing Tech., CAS in the ACL SIGHAN-sponsored First International Chinese Word Segmentation Bake-off. The authors introduce the unified HHMM-based frame of our Chinese lexical analyzer ICTCLAS and explain the operation of the six tracks. Then provide the evaluation results and give more analysis. Evaluation on ICTCLAS shows that its performance is competitive. Compared with other system, ICTCLAS has ranked top both in CTB and PK closed track. In PK open track, it ranks second position. ICTCLAS BIG5 version was transformed from GB version only in two days; however, it achieved well in two BIG5 closed tracks. Through the first bakeoff, we could learn more about the development in Chinese word segmentation and become more confident on our HHMM-based approach. At the same time, we really find our problems during the evaluation. The bakeoff is interesting and helpful.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121710589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 493
Combining Segmenter and Chunker for Chinese Word Segmentation 分词器与分块器相结合的中文分词方法
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119270
Masayuki Asahara, Chooi-Ling Goh, Xiaojie Wang, Yuji Matsumoto
{"title":"Combining Segmenter and Chunker for Chinese Word Segmentation","authors":"Masayuki Asahara, Chooi-Ling Goh, Xiaojie Wang, Yuji Matsumoto","doi":"10.3115/1119250.1119270","DOIUrl":"https://doi.org/10.3115/1119250.1119270","url":null,"abstract":"Our proposed method is to use a Hidden Markov Model-based word segmenter and a Support Vector Machine-based chunker for Chinese word segmentation. Firstly, input sentences are analyzed by the Hidden Markov Model-based word segmenter. The word segmenter produces n-best word candidates together with some class information and confidence measures. Secondly, the extracted words are broken into character units and each character is annotated with the possible word class and the position in the word, which are then used as the features for the chunker. Finally, the Support Vector Machine-based chunker brings character units together into words so as to determine the word boundaries.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134274321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Chinese Lexical Analysis Using Hierarchical Hidden Markov Model 基于层次隐马尔科夫模型的汉语词法分析
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119259
Huaping Zhang, Qun Liu, Xueqi Cheng, H. Zhang, Hongkui Yu
{"title":"Chinese Lexical Analysis Using Hierarchical Hidden Markov Model","authors":"Huaping Zhang, Qun Liu, Xueqi Cheng, H. Zhang, Hongkui Yu","doi":"10.3115/1119250.1119259","DOIUrl":"https://doi.org/10.3115/1119250.1119259","url":null,"abstract":"This paper presents a unified approach for Chinese lexical analysis using hierarchical hidden Markov model (HHMM), which aims to incorporate Chinese word segmentation, Part-Of-Speech tagging, disambiguation and unknown words recognition into a whole theoretical frame. A class-based HMM is applied in word segmentation, and in this level unknown words are treated in the same way as common words listed in the lexicon. Unknown words are recognized with reliability in role-based HMM. As for disambiguation, the authors bring forth an n-shortest-path strategy that, in the early stage, reserves top N segmentation results as candidates and covers more ambiguity. Various experiments show that each level in HHMM contributes to lexical analysis. An HHMM-based system ICTCLAS was accomplished. The recent official evaluation indicates that ICTCLAS is one of the best Chinese lexical analyzers. In a word, HHMM is effective to Chinese lexical analysis.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"567 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122931085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 139
A Bottom-up Merging Algorithm for Chinese Unknown Word Extraction 中文未知词提取的自底向上合并算法
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119255
Wei-Yun Ma, Keh-Jiann Chen
{"title":"A Bottom-up Merging Algorithm for Chinese Unknown Word Extraction","authors":"Wei-Yun Ma, Keh-Jiann Chen","doi":"10.3115/1119250.1119255","DOIUrl":"https://doi.org/10.3115/1119250.1119255","url":null,"abstract":"Statistical methods for extracting Chinese unknown words usually suffer a problem that superfluous character strings with strong statistical associations are extracted as well. To solve this problem, this paper proposes to use a set of general morphological rules to broaden the coverage and on the other hand, the rules are appended with different linguistic and statistical constraints to increase the precision of the representation. To disambiguate rule applications and reduce the complexity of the rule matching, a bottom-up merging algorithm for extraction is proposed, which merges possible morphemes recursively by consulting above the general rules and dynamically decides which rule should be applied first according to the priorities of the rules. Effects of different priority strategies are compared in our experiment, and experimental results show that the performance of proposed method is very promising.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126335059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Chinese Word Segmentation in MSR-NLP MSR-NLP中的中文分词
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119277
Andi Wu
{"title":"Chinese Word Segmentation in MSR-NLP","authors":"Andi Wu","doi":"10.3115/1119250.1119277","DOIUrl":"https://doi.org/10.3115/1119250.1119277","url":null,"abstract":"Word segmentation in MSR-NLP is an integral part of a sentence analyzer which includes basic segmentation, derivational morphology, named entity recognition, new word identification, word lattice pruning and parsing. The final segmentation is produced from the leaves of parse trees. The output can be customized to meet different segmentation standards through the value combinations of a set of parameters. The system participated in four tracks of the segmentation bakeoff -- PK-open, PK-close, CTB-open and CTB-closed - and ranked #1, #2, #2 and #3 respectively in those tracks. Analysis of the results shows that each component of the system contributed to the scores.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116718803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Building a Large Chinese Corpus Annotated with Semantic Dependency 基于语义依赖标注的大型汉语语料库的构建
Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119262
Mingqin Li, Juan-Zi Li, Zhendong Dong, Zuoying Wang, Dajin Lu
{"title":"Building a Large Chinese Corpus Annotated with Semantic Dependency","authors":"Mingqin Li, Juan-Zi Li, Zhendong Dong, Zuoying Wang, Dajin Lu","doi":"10.3115/1119250.1119262","DOIUrl":"https://doi.org/10.3115/1119250.1119262","url":null,"abstract":"At present most of corpora are annotated mainly with syntactic knowledge. In this paper, we attempt to build a large corpus and annotate semantic knowledge with dependency grammar. We believe that words are the basic units of semantics, and the structure and meaning of a sentence consist mainly of a series of semantic dependencies between individual words. A 1,000,000-word-scale corpus annotated with semantic dependency has been built. Compared with syntactic knowledge, semantic knowledge is more difficult to annotate, for ambiguity problem is more serious. In the paper, the strategy to improve consistency is addressed, and congruence is defined to measure the consistency of tagged corpus.. Finally, we will compare our corpus with other well-known corpora.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114626826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信