A. Al-Thubaity, Marwa Khan, Manal Al-Mazrua, Maram Al-Mousa
{"title":"New Language Resources for Arabic: Corpus Containing More Than Two Million Words and a Corpus Processing Tool","authors":"A. Al-Thubaity, Marwa Khan, Manal Al-Mazrua, Maram Al-Mousa","doi":"10.1109/IALP.2013.21","DOIUrl":"https://doi.org/10.1109/IALP.2013.21","url":null,"abstract":"Arabic is a resource-poor language relative to other languages with a similar number of speakers. This situation negatively affects corpus-based linguistic studies in Arabic and, to a lesser extent, Arabic language processing. This paper presents a brief overview of recent freely available Arabic corpora and corpora processing tools, and it examines some of the issues that may be preventing Arabic linguists from using the same. These issues reveal the need for new language resources to enrich and foster Arabic corpus-based studies. Accordingly, this paper introduces the design of a new Arabic corpus that includes modern standard Arabic varieties based on newspapers from all Arab countries and that comprises more than two million words, it also describes the main features of a corpus processing tool specifically designed for Arabic, called \"Khawas ÛæÇÕ\" (\"diver\" in English). Khawas provides more features than any other freely available corpus processing tool for Arabic, including n-gram frequency and concordance, collocations, and statistical comparison of two corpora. Finally, we outline modifications and improvements that could be made in future works.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131515110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sentiment Classification with Polarity Shifting Detection","authors":"Shoushan Li, Zhongqing Wang, Sophia Yat-Mei Lee, Chu-Ren Huang","doi":"10.1109/IALP.2013.44","DOIUrl":"https://doi.org/10.1109/IALP.2013.44","url":null,"abstract":"Sentiment classification is now a hot research issue in the community of natural language processing and the bag-of-words based machine learning approach is the state-of-the-art for this task. However, one important phenomenon, called polarity shifting, remains unsolved in the bag-of-words model, which sometimes makes the machine learning approach fails. In this study, we aim to perform sentiment classification with full consideration of the polarity shifting phenomenon. First, we extract some detection rules for detecting polarity shifting of sentimental words from a corpus which consists of polarity-shifted sentences. Then, we use the detection rules to detect the polarity-shifted words in the testing data. Third, a novel term counting-based classifier is designed by fully considering those polarity-shifted words. Evaluation shows that the novel term counting-based classifier significantly improves the performance of sentiment analysis across five domains. Furthermore, when this classifier is combined with a machine-learning based classifier, the combined classifier yields better performance than either of them.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134513362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topic and Its Negation in Chinese Sentences","authors":"Lin He, Qiong Peng","doi":"10.1109/IALP.2013.10","DOIUrl":"https://doi.org/10.1109/IALP.2013.10","url":null,"abstract":"There are two major views on the generation of sentential topics in Chinese, of which some are the ones moved from a syntactic position. Disagreement occurs as regards the so-called dangling topics. One view contends that dangling topics are the moved ones, thematically related to a position inside the comment, the other holds that they are base-generated and licensed by the non-empty set resulting from the intersection of the topic set and the set generated by the semantic variable in the comment. Both views help interpret the negation of topics in Chinese sentential negation from the perspective of syntax, semantics and pragmatics. It is suggested that the topic can be negated when the variable or related element in the comment has a co-referential relation with the topic, or has no definite referent.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128851896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of PLP Cepstral Features for Phonetic Segmentation","authors":"Bhavik B. Vachhani, H. Patil","doi":"10.1109/IALP.2013.47","DOIUrl":"https://doi.org/10.1109/IALP.2013.47","url":null,"abstract":"Phonetic segmentation can find its potential application for Text-to-Speech (TTS) synthesis and Automatic Speech Recognition (ASR) systems. In this paper, we propose use of Perceptual Linear Prediction Cepstral Coefficients (PLPCC) feature for phonetic segmentation task. To detect phonetic boundaries, we used spectral transition measure (STM). Using proposed approach, we achieve 85 % (i.e., 3 % better than state-of-the art Mel-frequency Cepstral Coefficients (MFCC) for 20 ms agreement duration) accuracy and 15 % over-segmentation rate (i.e., 8 % less than MFCC) for automatic boundary detection of 2, 34, 925 phone boundaries corresponding 630 speakers of entire TIMIT database.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"270 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116067157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Method for Network Topic Attention Forecast Based on Feature Words","authors":"Chunlei Yan, Shumin Shi, Heyan Huang, Ruijing Li","doi":"10.1109/IALP.2013.61","DOIUrl":"https://doi.org/10.1109/IALP.2013.61","url":null,"abstract":"The number of people who obtain information and express ideas via the Internet is increasing rapidly. Research on identifying how much attention paid to a given online topic plays an important role in the field of public opinion management. We propose a method to predict the netizens' attention on a specific online topic in this paper. Firstly, we acquire the historical topics' attention-degrees by analyzing news, reviews and forum posts, then built up the Feature Words Set (FWS) and estimate the popularity of each feature word. After that, we extract the feature words from a new topic and evaluate their contribution to it. Finally, the new attention-degree is computed by comparing the new topic's feature words with those in FWS. We compare our method with the Support Vector Regression model on a data set of manually selected topics. Experimental results show that our approach is acceptable for predicting the attention-degree of online topics.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116070745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Prosody Features of Mongolian Traditional Folk Long Song","authors":"Yasheng Jin, Wenmin Liu","doi":"10.1109/IALP.2013.15","DOIUrl":"https://doi.org/10.1109/IALP.2013.15","url":null,"abstract":"After labeling and extracting parameters from the voice signals of Mongolia Long Song 'The rich and vast Alashan', prosody features of Mongolian traditional folk Long Song are analyzed in the paper from the following two perspectives: 1) tone characteristics exploring the main prosody parameters, such as pitch, energy and time, 2) speech production characteristics explaining on the bases of formant and trill characteristics.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124113828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech Summarization without Lexical Features for Mandarin Presentation Speech","authors":"Jian Zhang, Huaqiang Yuan","doi":"10.1109/IALP.2013.48","DOIUrl":"https://doi.org/10.1109/IALP.2013.48","url":null,"abstract":"We present the first known empirical study on speech summarization without lexical features for Mandarin presentation speeches. We evaluate acoustic, lexical and structural features as predictors of summary sentences. We find that the summarizer yields good performance at the average F-measure of 0.625 even by using the combination of acoustic and structural features alone, which are independent of lexical features. In addition, we show that our summarizer performs surprisingly well at the average F-measure of 0.513 by using only acoustic features. These findings enable us to summarize speech without placing a stringent demand on speech recognition accuracy.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116356188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora","authors":"Jiayi Zhao, Xipeng Qiu, Xuanjing Huang","doi":"10.1109/IALP.2013.64","DOIUrl":"https://doi.org/10.1109/IALP.2013.64","url":null,"abstract":"Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has attracted more and more research interests to exploit heterogeneous annotation corpora for Chinese S&T. In this paper, we propose a unified model for Chinese S&T with heterogeneous annotation corpora. We first automatically construct a loose and uncertain mapping between two representative the heterogeneous corpora, Penn Chinese Tree bank (CTB) and PKU's People's Daily (PPD). Then we regard the Chinese S&T with heterogeneous corpora as two ``related'' tasks and train our unified model on two heterogeneous corpora simultaneously. Experiments show that our unified model can boost the performances of both of the heterogeneous corpora by using the shared information, and achieves significant improvements over the state-of-the-art methods.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123024705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Hierarchical Discourse Structure for Review Sentiment Analysis","authors":"Fei Wang, Yunfang Wu","doi":"10.1109/IALP.2013.42","DOIUrl":"https://doi.org/10.1109/IALP.2013.42","url":null,"abstract":"The overall sentiment of a text is critically affected by its discourse structure. For the first time, this paper incorporates hierarchical discourse structure into an unsupervised sentiment analysis framework. Experimental results show that by integrating discourse structure, the performance of sentiment analysis is improved by 1.9% (from 85.1% to 87.0%), demonstrating the effectiveness of exploiting discourse structure for sentiment analysis.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122110949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvements in Statistical Phrase-Based Interactive Machine Translation","authors":"Dongfeng Cai, Hua Zhang, Na Ye","doi":"10.1109/IALP.2013.27","DOIUrl":"https://doi.org/10.1109/IALP.2013.27","url":null,"abstract":"State-of-the-art Machine Translation (MT) systems are still far from being perfect. An alternative is the so-called Interactive Machine Translation (IMT). In this paper, we present some novel methods to improve the statistical phrase-based IMT. We utilize dynamic distortion limitation to balance the requirements of long distance reordering and decoding speed. And we introduce the difference function to the translation hypothesis extension as a heuristic function, to make the final translation candidates as diverse as possible. We also use the user validated prefix to direct the word selection of suffix based on a word co-occurrence model. All these methods aim at optimizing the first N-best candidate translations and look forward to reducing the cognitive burden of the users. The experiential results show the effectiveness of our methods.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130171200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}