{"title":"On the Semantic Orientation and Computer Identification of Adverb bié","authors":"Lin He, Di Wu","doi":"10.1109/IALP.2009.46","DOIUrl":"https://doi.org/10.1109/IALP.2009.46","url":null,"abstract":"The recognition of the semantic orientation of the adverb on the computer is a new temptation to discuss sentence processing starting from semantic. In this paper, in order to achieve computer automatic identification of semantic orientation, we focus on the syntactic environment and semantic orientation of the adverb “bié”, propose auto-process strategy and construct procedure diagram of automatic identification.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125865153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chinese Chunking Based on Coarse-Grained Part-of-Speech Features","authors":"Guanglu Sun, Y. Xue, Zhiming Xu, Fei Lang","doi":"10.1109/IALP.2009.54","DOIUrl":"https://doi.org/10.1109/IALP.2009.54","url":null,"abstract":"Although part-of-speech (POS) is an effective feature for Chinese Chunking, the POS-tagging errors generated by automatic POS tagger leads to almost 10% performance drop in F-score. To solve this problem, this paper presents new features to replace the POS features, namely the coarse-grained part-of-speech features. Combining with the methods of processing out-of-vocabulary words, the new features are utilized in the Chinese chunking model. Experimental results show that the new features can contribute 2.71% performance improvement over the baseline method.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127960226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic User Preferences Acquirement in Chinese Commercial Web Sites with NLP and DM Techniques","authors":"Shilin Zhang, Hui Wang","doi":"10.1109/IALP.2009.38","DOIUrl":"https://doi.org/10.1109/IALP.2009.38","url":null,"abstract":"Data mining are an important field of research. However, there is an important challenge to apply the Data mining technique to NLP applications. An integrated data mining system for NLP in Chinese Commercial Web Sites is presented in this paper. It firstly extracted the raw data using NLP technology and then presented the data mining process in detail to acquire the user preferences. More importantly, it puts forward a new enhanced and integrated method to Acquire User Preferences. Data mining applied to NLP provides a scientific basis for E-Commerce and Decision-making systems.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122822400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Chinese to English SMT with Multiple CWS Results","authors":"Ryan Ma, T. Zhao","doi":"10.1109/IALP.2009.36","DOIUrl":"https://doi.org/10.1109/IALP.2009.36","url":null,"abstract":"In Chinese to English statistical machine translation (SMT), Chinese texts always need a pre-processing high segments sentences into words and this standard approach is Chinese word segmentation (CWS). However, CWS is not developed for SMT, its results are not necessarily optimal for SMT. In recent years, many investigations have been performed concerning making CWS suitable for SMT, but we explore it from another direction. In this paper, our basic idea is to use multiple CWS results as additional language knowledge sources and we present a simple and effective approach to use multiple CWS results for SMT. We also give experiment results over range of strategy settings, and obtain substantial improvements in performance for translation from Chinese to English. The best result shows we gain 1.89 BLEU percentage points over a state of the art HPBT baseline system without using multiple CWS results.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"56 68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129080685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muyun Yang, Zhenyong Shi, Sheng Li, T. Zhao, Haoliang Qi
{"title":"Ranking vs. Classification: A Case Study in Mining Organization Name Translation from Snippets","authors":"Muyun Yang, Zhenyong Shi, Sheng Li, T. Zhao, Haoliang Qi","doi":"10.1109/IALP.2009.73","DOIUrl":"https://doi.org/10.1109/IALP.2009.73","url":null,"abstract":"Both classification and ranking strategy have been reported positively in mining the named entity (NE) translation from the snippets re-turned by the web search engine. Taking the most challenging issue of the organization name and its translation as an example, this paper conducts a contrastive study on the two strategies under SVM framework. We empirically show that the method of translation ranking achieves the best performance in various data settings, with the best Top-1 precision up to 65.75%. We conclude that, compared with the classification strategy, the ranking strategy is more suitable in such snippet based translation mining, in which the unbalance data issue prevails.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129398435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Reddy, B. Sasidhar, B. H. Reddy, B. V. Vardhan, L. Reddy, A. Govardhan
{"title":"Approaches of Dimensionality Reduction for Telugu Document Classification","authors":"P. Reddy, B. Sasidhar, B. H. Reddy, B. V. Vardhan, L. Reddy, A. Govardhan","doi":"10.1109/IALP.2009.82","DOIUrl":"https://doi.org/10.1109/IALP.2009.82","url":null,"abstract":"Document classification is one of the prominent area of research evolved as a result of exponential growth in the usage of electronic documents. Classification of documents demands for understanding of document units by removing insignificant data and improving computational efficiency. This paper deals with the approaches aimed at Dimensionality Reduction (DR) in document units for Telugu. Bag of words is a generic model for English document classification, adaptation of this model on Indic based scripts found to have a meager performance. Two approaches are presented in this paper, first approach deals with language specific and Corpus based dimensionality reduction termed as validity based DR. The other approach is Category and Document specific approach termed as category based DR. The performance of the two approaches is evaluated with the help of accuracy as a measure.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125449083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raymond W. M. Ng, Tan Lee, C. Leung, B. Ma, Haizhou Li
{"title":"Analysis and Selection of Prosodic Features for Language Identification","authors":"Raymond W. M. Ng, Tan Lee, C. Leung, B. Ma, Haizhou Li","doi":"10.1109/IALP.2009.34","DOIUrl":"https://doi.org/10.1109/IALP.2009.34","url":null,"abstract":"Prosodic features are relatively simple in their structures and are believed to be effective in some speech recognition tasks. However, this kind of features is subject to undesirable bias factors, such as speaking styles. To cope with this, researchers have suggested various normalization and measure methods to the features, which makes the feature inventory very large. In this paper, we use a mutual information criterion to analyze and select a number of prosody-related features in a language identification (LID) task. Among twelve optimal features, eight of them are elaborated in this paper. The feature analysis metric, z-score, is shown to have a moderate to high correlation with LID accuracies. Feature selection proposed in this paper brings about the best performance among all prosodic LID systems to our knowledge. A further attempt in system fusion shows a 13% relative improvement the prosodic LID system brings to the conventional phonotactic approach to LID.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting Thai Compounds Using Collocations and POS Bigram Probabilities without a POS Tagger","authors":"Wirote Aroonmanakun","doi":"10.1109/IALP.2009.33","DOIUrl":"https://doi.org/10.1109/IALP.2009.33","url":null,"abstract":"This paper presents a simple method to extract compounds using statistical collocations and POS bigram probabilities without a POS tagger. Statistical collocation was used to determine strength of word co-occurrences. Probabilities of POS sequences were used to adjust the strength of collocation within a possible compound. These probabilities were estimated from compounds found in the dictionary. Bigram and trigram words extracted from a corpus of 28 million words were ranked by two means, collocation scores and collocation scores weighted by POS pattern probabilities. Cutoff precision at every 200 points were calculated for both methods. The results showed that probabilities of POS sequences could increase the precision rate of compound extraction at certain level. The system can extract 2-word compounds and 3-word compounds at the precision rate up to 63% and 35% respectively. When eliminating bigram extractions that could be parts of trigram extraction, the precision rate is increased up to 71%.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Lattice-Based Phonotactic Language Recognition System with CMLLR Adaptation and Its Implementation Issues","authors":"C. Leung, R. Tong, B. Ma, Haizhou Li","doi":"10.1109/IALP.2009.67","DOIUrl":"https://doi.org/10.1109/IALP.2009.67","url":null,"abstract":"This paper presents a “non-complicated” automatic spoken language recognition system which can be effectively implemented using publicly available toolkits (such as HTK, SRILM and SVM-Light) and corpus resources (such as Switchboard, CallFriend, OHSU and NIST LRE07 speech corpora). This system involves two context-independent phone recognizers, a vector space modelling classifier and an equal weight fusion of likelihood scores from the classifier. CMLLR adaptation and phone lattice are also used in this system. Our experiments show that these two techniques are essential in obvious performance improvement. Despite the simplicity of the system, it achieves the EER of 2.72% in the 30-sec condition in NIST LRE-2007 evaluation data set. Moreover, we describe our experience how we use the large amount of available training data to effectively test different configurations in the phone recognizers. This practical issue should be interesting to the later comers who plan to participate in NIST Language Recognition evaluation or similar international benchmark campaigns.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121493686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Reordering Rules for Hierarchical Phrase-Based Translation","authors":"Shu Cai, Yajuan Lü, Qun Liu","doi":"10.1109/IALP.2009.22","DOIUrl":"https://doi.org/10.1109/IALP.2009.22","url":null,"abstract":"Hierarchical phrase-based translation model has been proven to be a simple and powerful machine translation model. However, due to the computational complexity constraints, the extraction and use of hierarchical rules are usually restricted under certain limits, and these limits could have a negative impact on the performance of the translation model, especially for reordering. This paper presents a solution to improve the reordering of hierarchical phrase-based translation model. We propose a two-step method to extract improved reordering rules with less limits. These reordering rules help both local and non-local reordering, and could be incorporated to a hierarchical phrase-based translation system easily. Experiments show that our approach achieves statistically significant improvements over the baseline system in Chinese-English translation.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122556193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}