Int. J. Comput. Linguistics Chin. Lang. Process.最新文献_第7页

Chinese Main Verb Identification: From Specification to Realization 汉语动词识别:从规范到实现

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2005-03-01 DOI: 10.30019/IJCLCLP.200503.0004

Binggong Ding, C. Huang, Degen Huang

{"title":"Chinese Main Verb Identification: From Specification to Realization","authors":"Binggong Ding, C. Huang, Degen Huang","doi":"10.30019/IJCLCLP.200503.0004","DOIUrl":"https://doi.org/10.30019/IJCLCLP.200503.0004","url":null,"abstract":"Main verb identification is the task of automatically identifying the predicate-verb in a sentence. It is useful for many applications in Chinese Natural Language Processing. Although most studies have focused on the model used to identify the main verb, the definition of the main verb should not be overlooked. In our specification design, we have found many complicated issues that still need to be resolved since they haven't been well discussed in previous works. Thus, the first novel aspect of our work is that we carefully design a specification for annotating the main verb and investigate various complicated cases. We hope this discussion will help to uncover the difficulties involved in this problem. Secondly, we present an approach to realizing main verb identification based on the use of chunk information, which leads to better results than the approach based on part-of-speech. Finally, based on careful observation of the studied corpus, we propose new local and contextual features for main verb identification. According to our specification, we annotate a corpus and then use a Support Vector Machine (SVM) to integrate all the features we propose. Our model, which was trained on our annotated corpus, achieved a promising F score of 92.8%. Furthermore, we show that main verb identification can improve the performance of the Chinese Sentence Breaker, one of the applications of main verb identification, by 2.4%.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130912052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Similarity Based Chinese Synonym Collocation Extraction 基于相似度的汉语同义词搭配提取

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2005-03-01 DOI: 10.30019/IJCLCLP.200503.0006

Wanyin Li, Q. Lu, Ruifeng Xu

引用次数: 12

Detecting Emotions in Mandarin Speech 普通话语音中的情绪检测

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2004-09-01 DOI: 10.30019/IJCLCLP.200509.0004

T. Pao, Yu-Te Chen, Jun-Heng Yeh, Wen-Yuan Liao

引用次数: 40

Automated Alignment and Extraction of a Bilingual Ontology for Cross-Language Domain-Specific Applications 面向跨语言领域特定应用的双语本体的自动对齐和提取

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2004-08-23 DOI: 10.3115/1220355.1220519

Jui-Feng Yeh, Chung-Hsien Wu, Ming-Jun Chen, Liang-Chih Yu

引用次数: 14

Toward Constructing A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin Chinese 建构台语(闽南语)、客家话与国语多语语料库之探讨

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2004-08-01 DOI: 10.30019/IJCLCLP.200408.0001

Ren-Yuan Lyu, Min-Siong Liang, Yuang-Chin Chiang

引用次数: 17

Multiple-Translation Spotting for Mandarin-Taiwanese Speech-to-Speech Translation 普通话-台语语音翻译的多重翻译定位

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2004-08-01 DOI: 10.30019/IJCLCLP.200408.0002

Jhing-Fa Wang, Shun-Chieh Lin, Hsueh-Wei Yang, Fan-Min Li

引用次数: 3

The Properties and Further Applications of Chinese Frequent Strings 汉语频繁串的性质及其进一步应用

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2004-02-01 DOI: 10.30019/IJCLCLP.200402.0007

Yih-Jeng Lin, Ming-Shing Yu

引用次数: 6

Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model 孟子:基于最大熵混合模型的中文命名实体识别器

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2004-02-01 DOI: 10.30019/IJCLCLP.200402.0004

Richard Tzong-Han Tsai, Shih-Hung Wu, Cheng-Wei Lee, Cheng-Wei Shih, W. Hsu

{"title":"Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model","authors":"Richard Tzong-Han Tsai, Shih-Hung Wu, Cheng-Wei Lee, Cheng-Wei Shih, W. Hsu","doi":"10.30019/IJCLCLP.200402.0004","DOIUrl":"https://doi.org/10.30019/IJCLCLP.200402.0004","url":null,"abstract":"This paper presents a Chinese named entity recognizer (NER): Mencius. It aims to address Chinese NER problems by combining the advantages of rule-based and machine learning (ML) based NER systems. Rule-based NER systems can explicitly encode human comprehension and can be tuned conveniently, while ML-based systems are robust, portable and inexpensive to develop. Our hybrid system incorporates a rule-based knowledge representation and template-matching tool, called InfoMap [Wu et al. 2002], into a maximum entropy (ME) framework. Named entities are represented in InfoMap as templates, which serve as ME features in Mencius. These features are edited manually, and their weights are estimated by the ME framework according to the training data. To understand how word segmentation might influence Chinese NER and the differences between a pure template-based method and our hybrid method, we configure Mencius using four distinct settings. The F-Measures of person names (PER), location names (LOC) and organization names (ORO) of the best configuration in our experiment were respectively 94.3%, 77.8% and 75.3%. From comparing the experiment results obtained using these configurations reveals that hybrid NER Systems always perform better performance in identifying person names. On the other hand, they have a little difficulty identifying location and organization names. Furthermore, using a word segmentation module improves the performance of pure Template-based NER Systems, but, it has little effect on hybrid NER systems.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126639489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Automatic Pronominal Anaphora Resolution in English Texts 英语语篇代词回指自动消解

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2004-02-01 DOI: 10.30019/IJCLCLP.200402.0002

Tyne Liang, Dian-Song Wu

{"title":"Automatic Pronominal Anaphora Resolution in English Texts","authors":"Tyne Liang, Dian-Song Wu","doi":"10.30019/IJCLCLP.200402.0002","DOIUrl":"https://doi.org/10.30019/IJCLCLP.200402.0002","url":null,"abstract":"Anaphora is a common phenomenon in discourses as well as an important research issue in the applications of natural language processing. In this paper, anaphora resolution is achieved by employing WordNet ontology and heuristic rules. The proposed system identifies both intra-sentential and inter-sentential antecedents of anaphors. Information about animacy is obtained by analyzing the hierarchical relations of nouns and verbs in the surrounding context. The identification of animacy entities and pleonastic-it usage in English discourses are employed to promote resolution accuracy. Traditionally, anaphora resolution systems have relied on syntactic, semantic or pragmatic clues to identify the antecedent of an anaphor. Our proposed method makes use of WordNet ontology to identify animate entities as well as essential gender information. In the animacy agreement module, the property is identified by the hypernym relation between entities and their unique beginners defined in WordNet. In addition, the verb of the entity is also an important clue used to reduce the uncertainty. An experiment was conducted using a balanced corpus to resolve the pronominal anaphora phenomenon. The methods proposed in (Lappin and Leass, 94) and (Mitkov, 01) focus on the corpora with only inanimate pronouns such as \"it\" or \"its\". Thus the results of intra-sentential and inter-sentential anaphora distribution are different. In an experiment using Brown corpus, we found that the distribution proportion of intra-sentential anaphora is about 60%. Seven heuristic rules are applied in our system; five of them are preference rules, and two are constraint rules. They are derived from syntactic, semantic, pragmatic conventions and from the analysis of training data. A relative measurement indicates that about 30% of the errors can be eliminated by applying heuristic module.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133617079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses 基于句法和统计分析的双语搭配抽取

Int. J. Comput. Linguistics Chin. Lang. Process. Pub Date : 2003-09-01 DOI: 10.30019/IJCLCLP.200402.0001

Chien-Cheng Wu, Jason J. S. Chang

{"title":"Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses","authors":"Chien-Cheng Wu, Jason J. S. Chang","doi":"10.30019/IJCLCLP.200402.0001","DOIUrl":"https://doi.org/10.30019/IJCLCLP.200402.0001","url":null,"abstract":"In this paper, we describe an algorithm that employs syntactic and statistical analysis to extract bilingual collocations from a parallel corpus. Collocations are pervasive in all types of writing and can be found in phrases, chunks, proper names, idioms, and terminology. Therefore, automatic extraction of monolingual and bilingual collocations is important for many applications, including natural language generation, word sense disambiguation, machine translation, lexicography, and cross language information retrieval. Collocations can be classified as lexical or grammatical collocations. Lexical collocations exist between content words, while a grammatical collocation exists between a content word and function words or a syntactic structure. In addition, bilingual collocations can be rigid or flexible in both languages. Rigid collocation refers to words in a collocation must appear next to each other, or otherwise (flexible/elastic). We focus in this paper on extracting rigid lexical bilingual collocations. In our method, the preferred syntactic patterns are obtained from idioms and collocations in a machine-readable dictionary. Collocations matching the patterns are extracted from aligned sentences in a parallel corpus. We use a new alignment method based on punctuation statistics for sentence alignment. The punctuation-based approach is found to outperform the length-based approach with precision rates approaching 98%. The obtained collocations are subsequently matched up based on cross-linguistic statistical association. Statistical association between the whole collocations as well as words in collocations is used to link a collocation with its counterpart collocation in the other language. We implemented the proposed method on a very large Chinese-English parallel corpus and obtained satisfactory results.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115553036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39