{"title":"Imbalanced Sentiment Classification with Multi-strategy Ensemble Learning","authors":"Zhongqing Wang, Shoushan Li, Guodong Zhou, Peifeng Li, Qiaoming Zhu","doi":"10.1109/IALP.2011.28","DOIUrl":"https://doi.org/10.1109/IALP.2011.28","url":null,"abstract":"Recently, sentiment classification has become a hot research topic in natural language processing. But most existing studies assume that the samples in the negative and positive categories are balanced, which might not be true in real applications. In this paper, we investigate sentiment classification tasks where the class distribution of the sam-ples is imbalanced. To handle the imbalanced problem, we propose a multi-strategy ensemble learning approach to this problem. Our ensemble approach integrates sample-ensemble, feature-ensemble, and classifier-ensemble by ex-ploiting multiple classification algorithms. Evaluation across four domains shows that our ensemble approach outper-forms many other popular approaches that handling imbal-anced classification problems, such as re-sampling and cost-sensitive approaches, and is proven effective for imbalanced sentiment classification.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117178249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BASRAH: Arabic Verses Meters Identification System","authors":"Z. Khalaf, Maytham Alabbas, T. Tan","doi":"10.1109/IALP.2011.19","DOIUrl":"https://doi.org/10.1109/IALP.2011.19","url":null,"abstract":"In this paper, we present BASRAH, a system that automatically identifies the meter of Arabic verse, which is an operation that requires a certain level of human expertise. BASRAH uses the numerical prosody method, which depends on verse coding that is derived from the general concept of al-Khalil's feet through using the two primary units (cord=2 and peg=3). BASRAH has proved to be an efficient tool to help inexperienced users to determine the meters of Arabic verses when we tested it on thousands of old and modern Arabic verses.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122487921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Flora Samson Juan, V. Edwin, Chai Yeen Cheong, Jun Choi Lee, A. Yeo
{"title":"Adopting Malay Syllable Structure for Syllable Based Speech Synthesizer for Iban and Bidayuh Languages","authors":"Sarah Flora Samson Juan, V. Edwin, Chai Yeen Cheong, Jun Choi Lee, A. Yeo","doi":"10.1109/IALP.2011.21","DOIUrl":"https://doi.org/10.1109/IALP.2011.21","url":null,"abstract":"Sarawak, Malaysia, has many under-resourced languages, which stands to become extinct if measures are not taken to preserve and maintain them. These languages are mostly spoken by the indigenous groups and not all of the languages are documented or studied. As an initiative to preserve, a Text to Speech (TTS) system has been built for Iban and Bidayuh languages, two out of 44 living languages in Sarawak. To expedite the development, we employed knowledge of closely-related language, i.e. Malay, which is the first language in Malaysia. In this paper, we employed a syllabification algorithm based on Malay syllable structure to build the Iban and Bidayuh syllable list and speech corpus. An accuracy test for the algorithm was conducted to determine the quality of the output from the TTS system using Categorical Estimation (CE). Test showed high percentage in accuracy and quality has a mean score of 3.07 out of 5, suggesting the approach works.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116264714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building a Rule-Based Malay Text Segmentation Tool","authors":"Bali Ranaivo-Malançon","doi":"10.1109/IALP.2011.42","DOIUrl":"https://doi.org/10.1109/IALP.2011.42","url":null,"abstract":"This paper presents the different problems that need to be taken into account in building a rule-based Malay text segmentation tool that can split a text into sentences and tokens. The tool was compared to English and Malay tokenisers to highlight the characteristics of Malay texts.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125736927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formalization and Rules for Recognition of Satirical Irony","authors":"Lingpeng Kong, Likun Qiu","doi":"10.1109/IALP.2011.14","DOIUrl":"https://doi.org/10.1109/IALP.2011.14","url":null,"abstract":"Satirical irony (\"·´·í\") is a very important language phenomena. Its recognition is of great importance to sentiment analysis. However, researches on this topic are still quite rare and existing studies have problems such as unclear definition and unclear objects of study. To solve these problems, we first give clear definitions of satirical irony. Then we discuss in what level satirical irony occurs. Finally, we propose some features of satirical irony.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116906653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Rule-Based Source-Side Reordering on Phrase Structure Subtrees","authors":"Fangli Liang, Lei Chen, Miao Li, Nasun-Urtu","doi":"10.1109/IALP.2011.12","DOIUrl":"https://doi.org/10.1109/IALP.2011.12","url":null,"abstract":"Since different languages put words in different orders, reordering is an important issue in statistical machine translation. The paper proposes a rule-based reordering method at the source side as a preprocessing step, which applies some syntactic reordering rules on the phrase structure subtree to reorder source language. The reordering rules integrate the phrase structure tree with part-of-speech tags, which can implement the reordering not only between words but also between words and phrases. And the problems of long-distance reordering and translation errors can be partly solved. Meanwhile, the interference between reordering rules of this method has been significantly reduced in this method. Experiments shows that our method can improve the performance of the state-of-the-art phrase translation models, achieving 1.71 BLEU score increase over the standard phrase-based machine translation system.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129780708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Semantic Orientation and Computer Identification of the Chinese Adverb cai","authors":"Lin He, Pengbing Chen","doi":"10.1109/IALP.2011.76","DOIUrl":"https://doi.org/10.1109/IALP.2011.76","url":null,"abstract":"The recognition of the semantic orientation of Chinese adverb on the computer is a new attempt. In this paper, in order to achieve computer automatic identification of the adverb \"cai\", the rules and principles of the semantic orientation of this type are summarized and proposed respectively according to its sentence structure. Based on this, the automatic identification strategies are explored and corresponding procedure diagrams are given.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129950372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research of Event Pronoun Resolution","authors":"Ning Zhang, Fang Kong, Peifeng Li","doi":"10.1109/IALP.2011.31","DOIUrl":"https://doi.org/10.1109/IALP.2011.31","url":null,"abstract":"Event anaphora resolution plays an important role in discourse analysis. In comparison with general noun phrases, pronouns carry little information of themselves, resolving the event pronouns is a more difficult task. This paper proposes a machine learning-based framework for event pronoun resolution. All kinds of features, including both flat and structural features, are explored for event pronoun resolution. Experiments on OntoNotes corpus show that both flat and structural features are very effective for this task.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"47 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131957192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Comparison of Chinese Spam Filter Based on Generative Model and Discriminative Model","authors":"Yong Han, Yingying Wang, Huafu Ding, Haoliang Qi","doi":"10.1109/IALP.2011.64","DOIUrl":"https://doi.org/10.1109/IALP.2011.64","url":null,"abstract":"Previous studies have shown that discriminative model is better than generative model for spam filtering, which is tested on the English dataset. But the study on Chinese Spam Filter is rare. So we compared the performance of Bogo: a classical generative model, Logistic Regression (LR) and Relaxed Online SVM (ROSVM): two typical discriminative models on the Chinese dataset. Bogo system adopts a generative model, which is based on Bayesian algorithm. We choose the public Chinese datasets: TREC06c, SEWM 2008, SEWM 2010, SEWM 2011, as the test dataset with immediate feedback. The discriminative model gives the better results than the generative model based on spam filter. ROSVM gives the best performance on Chinese spam filter.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125725523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Centroid Integer Selection Model -- A High Efficiency Method on Dynamic Multi-document Summarization","authors":"Meiling Liu, Dequan Zheng, T. Zhao, Yang Yu","doi":"10.1109/IALP.2011.56","DOIUrl":"https://doi.org/10.1109/IALP.2011.56","url":null,"abstract":"This paper researches centroid integer selection based on dynamic multi-document summarization (DMS) and presentes a dynamic multi-document summarization model, called Centroid Integer Selection Model (CISM). This model has mainly two steps. First, some abstracts were extracted from the document sets based on different first sentence, respectively. Second, the best abstract was selected based on centroid strategy from all the abstracts created in the first step. The best advantage this model showed was that it eliminated the effect caused by falsely selecting based on the first sentence. Some experiments were conducted on the Update Task test data from TAC2008, and results of new model were compared with results from the TAC2008 evaluation.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"136 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120980394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}