{"title":"Research on Chinese Text Summarization Algorithm Based on Statistics and Rules","authors":"Faguo Zhou, Fan Zhang, Bingru Yang","doi":"10.1109/IALP.2009.56","DOIUrl":"https://doi.org/10.1109/IALP.2009.56","url":null,"abstract":"Text summarization is a meaningful part of the research of natural language document understanding, and it is an important branch of natural language processing. In this paper, on the basis of the research status quo of the researchers and experts both home and abroad, two text summarization algorithms are proposed. And one algorithm is rule-based, and the other is based on statistics.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"48 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123554337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmenting Long Sentence Pairs for Statistical Machine Translation","authors":"Biping Meng, Shujian Huang, Xinyu Dai, Jiajun Chen","doi":"10.1109/IALP.2009.20","DOIUrl":"https://doi.org/10.1109/IALP.2009.20","url":null,"abstract":"In phrase-based statistical machine translation, the knowledge about phrase translation and phrase reordering is learned from the bilingual corpora. However, words may be poorly aligned in long sentence pairs in practice, which will then do harm to the following steps of the translation, such as phrase extraction, etc. A possible solution to this problem is segmenting long sentence pairs into shorter ones. In this paper, we present an effective approach to segmenting sentences based on the modified IBM Translation Model 1. We find that by taking into account the semantics of some words, as well as the length ratio of source and target sentences, the segmentation result is largely improved. We also discuss the effect of length factor to the segmentation result. Experiments show that our approach can improve the BLEU score of a phrase-based translation system by about 0.5 points.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advances in Acoustic Modeling for Vietnamese LVCSR","authors":"Tuan-Nam Nguyen, Q. Vu","doi":"10.1109/IALP.2009.66","DOIUrl":"https://doi.org/10.1109/IALP.2009.66","url":null,"abstract":"In this paper, we present our experiments on the selection of basic phonetic units for the Vietnamese large vocabulary continuous speech recognition (LVCSR). Two acoustic models were compared. The first model has just used vowels or monophthongs as phonemes [2] while the second one, which was proposed in this paper, has explored the use of diphthongs and triphthongs as phonemes as well. The two models were trained and evaluated on a Broadcast News corpus containing 27 hours of acoustic training data and 1 hour of acoustic testing data. Moreover, an 146M-word corpus collection of newspaper was employed for building the language models. Experimental results indicate significant improvements in both word accuracy rate and time-execution. With the second acoustic model, the word accuracy rates reach 86.06% on the best case and the execution time is faster than the real-time.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122179905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Method for Extraction of Partial Correspondence from Parallel Corpus","authors":"Ryo Terashima, Hiroshi Echizen-ya, K. Araki","doi":"10.1109/IALP.2009.69","DOIUrl":"https://doi.org/10.1109/IALP.2009.69","url":null,"abstract":"For machine translations using a parallel corpus, it is effective to extract partial correspondences: pairs of phrases of the source language(SL) and target language(TL) in bilingual sentences. However, it is difficult to extract the partial correspondences correctly and efficiently in the data sparse corpus. In this paper, we propose a new learning method that extracts the partial correspondences solely from the parallel corpus without any analytical tools. In the proposed method, the extraction rules are automatically acquired from bilingual sentences using bi-gram statistics in each language sentence and the similarity based on Dice coefficient between SL words and TL words. The acquired extraction rules possess information about the first parts(e.g., \"a\", \"the\") or the last parts in phrases. Moreover, the partial correspondences are extracted from the bilingual sentences using the extraction rules correctly and efficiently. Evaluation experiments indicated that our proposed method can improve the translation quality of the learning-type machine translation by correctly and efficiently extracting the partial correspondences in bilingual sentences.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121500459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dialog-Act Recognition Using Discourse and Sentence Structure Information","authors":"Keyan Zhou, Chengqing Zong","doi":"10.1109/IALP.2009.12","DOIUrl":"https://doi.org/10.1109/IALP.2009.12","url":null,"abstract":"Automatic recognition of Dialog-act (DA) is one of the most important processes in understanding spontaneous dialog. Most existing studies have been working on how to use various classifying methods in DA recognition; meanwhile, less attention has been paid to feature selection specifically. This paper introduces several textual features for DA recognizing, and proposes a novel usage for sentence structure features. Especially, this paper investigates the effect of discourse structure features in DA recognition, which are little studied before. The experimental results on both Chinese corpus and English Corpus show the selected features and feature combination rules significantly improve the overall performance. The accuracy of DA recognition rises from 77.05% to 88.21% on Chinese corpus, and from 59.08% to 64.92% as well on English corpus.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117013382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information Focus Synthesis Based on Question Answer Chain","authors":"Jing Wan, Han Ren","doi":"10.1109/IALP.2009.16","DOIUrl":"https://doi.org/10.1109/IALP.2009.16","url":null,"abstract":"While speech synthesis technologies have come a long way in recent ten years, there is still room for improvement. This paper describes a technique called based on joint information structure, syntax and prosody method, which demonstrates noticeable improvements to existing speech synthesis system. As an important parameter for prosody proceedings in mandarin, information focus prosodic distribution features are typical for hearing natural, speech understanding and in-formation acquisition. Because of the complex mapping relation between information structure, syntax and prosody, we present an efficient method for retrieval information focus to augment a naturalness speech synthesis. We use question answering chain to extract information focus and discover them how to move. Then, we adopt feature classification and prosody predictive modeling to deal with fo-cus’s F0 and time period and obtain them features module. Based on the features module, should significantly increase the accuracy and naturalness of speech synthesis. The rest of this paper is organized as follows. Section 2 summarizes the previously proposed theory for information focus extraction, and derives a new method. Experiments are expressed in Section 3. And experimental results are shown in Section 4. Concluding remarks are presented in the final section.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121136159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on Semantic Role Labeling of Korean Sentence","authors":"Yude Bi, Jing Chen","doi":"10.1109/IALP.2009.30","DOIUrl":"https://doi.org/10.1109/IALP.2009.30","url":null,"abstract":"The study of semantic role labeling is a hotspot in the field of Natural Language Processing. This paper, together with rationalism and empiricism, with the principle of pragmatism, from the perspective of semantic information processing, poses an approach to label the semantic role of Korean. The approach theoretically based on the level-framework of Korean verbs’ syntax and semantic, assisted with feature vector-based approach, combined with the classification marked database of category and concept, is used to test marked corpus for semantic role labeling study.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116836691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chinese Named Entity Recognition Using a Morpheme-Based Chunking Tagger","authors":"G. Fu","doi":"10.1109/IALP.2009.68","DOIUrl":"https://doi.org/10.1109/IALP.2009.68","url":null,"abstract":"Most previous studies formalize Chinese named entity recognition (NER) as a chunking task with either characters or lexicon words as the basic tokens for chunking. However, it is difficult under this formulation to explore lexical information for NER. Furthermore, traditional NER chunking systems usually employ an exhaustive strategy for entity candidate generation, obviously resulting in efficiency loss during entity decoding. In this paper we propose a morpheme-based chunking framework for Chinese NER and implement an efficient three-stage tagger using the pipeline strategy. To tackle the problem of out-of-vocabulary words and to more effectively explore lexical cues for NER as well, we distinguish named entities from common words and choose morphemes as the basic tokens for entity chunking. To reduce the space of entity candidates and improve the efficiency of entity decoding, we employ internal entity formation pattern rules during entity candidate generation. Our experiments on different datasets show that our system can greatly improve NER efficiency without much degradation of performance.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129483955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Tran, V. Nguyen, Oanh T. K. Tran, Uyen Thu Thi Pham, Quang-Thuy Ha
{"title":"An Experimental Study of Vietnamese Question Answering System","authors":"V. Tran, V. Nguyen, Oanh T. K. Tran, Uyen Thu Thi Pham, Quang-Thuy Ha","doi":"10.1109/IALP.2009.39","DOIUrl":"https://doi.org/10.1109/IALP.2009.39","url":null,"abstract":"The development of World Wide Web calls for how to efficiently exploit the information. Mostly, current search engines return a set of related documents which contain keywords. However, users expect the exact and concrete answer for each question. Therefore, it is necessary to build an automatic question answering system (QA). In this paper, we focus on building a QA for Vietnamese. This task especially becomes more and more difficult because of the lack of available tools for processing Vietnamese text. Based on previous research for English, this paper proposed an implementation for Vietnamese question answering system by combining SnowBall system [1] and semantic relation extraction using search engines [4]. The experimental results on travelling domain proved that this proposed method is sufficient for Vietnamese question answering system. We achieved 89.7% precision and 91.4% ability to give the answers when testing on travelling domain","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129630495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Wang, T. Zhao, Muyun Yang, Hongfei Jiang, Sheng Li
{"title":"Stability vs. Effectiveness: Improved Sentence-Level Combination of Machine Translation Based on Weighted MBR","authors":"Bo Wang, T. Zhao, Muyun Yang, Hongfei Jiang, Sheng Li","doi":"10.1109/IALP.2009.17","DOIUrl":"https://doi.org/10.1109/IALP.2009.17","url":null,"abstract":"We describe an improved strategy to combine the outputs of machine translation on sentence-level balancing the stability and the effectiveness of the combination. The new method alternates the classical MBR-based sentence-level combination with weighted Minimum Bayes Risk (wMBR). During the calculation of the risk, we weight the hypotheses with the performance of the MT system, which is measured by the automatic evaluation metrics on the development data. In experiments, the wMBR-based method stably achieve better results than other sentence-level methods and get the best position in CWMT08 evaluation track outperforming the other word-level and sentence-level combination systems.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127476196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}