Kui Wu, Xuancong Wang, Nina Zhou, AiTi Aw, Haizhou Li
{"title":"Joint Chinese word segmentation and punctuation prediction using deep recurrent neural network for social media data","authors":"Kui Wu, Xuancong Wang, Nina Zhou, AiTi Aw, Haizhou Li","doi":"10.1109/IALP.2015.7451527","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451527","url":null,"abstract":"In this work, we propose to jointly perform Chinese word segmentation (CWS) and punctuation prediction (PU) in a unified framework using deep recurrent neural network (DRNN). We further perform a comparative study among the joint frameworks, the isolated prediction and the pipeline methods that link the two tasks sequentially, on a social media corpus. Our experimental results show that joint models improve performance of CWS and affect PU marginally. We also study the effects of CWS and PU on Chinese-to-English machine translation (MT) quality by evaluating on a parallel social media corpus. It is shown that joint models are superior to the isolated prediction and the pipeline approaches.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124142963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-extended phrase reordering model for pivot-based statistical machine translation","authors":"Xiaoning Zhu, T. Zhao, Yiming Cui, Conghui Zhu","doi":"10.1109/IALP.2015.7451524","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451524","url":null,"abstract":"For translation between language pairs which is lack of bilingual data, pivot-based SMT uses a pivot language as a “bridge” to generate source-target translation, inducing from source-pivot and pivot-target translation. However, due to the missing of the context information, the reordering model was hard to obtain with the conventional methods. In this paper, we present a context-extended phrase reordering model for pivot-based statistical machine translation by extending the context information in source, pivot and target language. Experimental results show that our method leads to significant improvements over the baseline system on European Parliament data.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linyu Wei, Miao Li, Lei Chen, Zhenxin Yang, Kai Sun, Man Yuan
{"title":"Extracting bilingual multi-word expressions for low-resource statistical machine translation","authors":"Linyu Wei, Miao Li, Lei Chen, Zhenxin Yang, Kai Sun, Man Yuan","doi":"10.1109/IALP.2015.7451522","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451522","url":null,"abstract":"Improving the performance of statistical machine translation is often a significant problem, especially in low language resource scenarios such as Chinese-Mongolian SMT. In this paper, we propose a method to improve the performance of Chinese-Mongolian SMT system using multi-word expressions, which is also a pilot study for this language pair. We extract MWEs from the phrase-table then integrate the MWEs into SMT system by various strategies. Experimental results indicate our method outperforms a baseline model by 0.81 BLEU points on Test-All and 1.54 BLEU points on Test-MWE.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115711657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning sentiment-inherent word embedding for word-level and sentence-level sentiment analysis","authors":"Zhihua Zhang, Man Lan","doi":"10.1109/IALP.2015.7451540","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451540","url":null,"abstract":"Vector-based word representations have made great progress on many Natural Language Processing tasks. However, due to the lack of sentiment information, the traditional word vectors are insufficient to settle sentiment analysis tasks. In order to capture the sentiment information, we extended Continuous Skip-gram model (Skip-gram) and presented two sentiment word embedding models by integrating sentiment information into semantic word representations. Experimental results showed that the sentiment word embeddings learned by two models indeed capture sentiment and semantic information as well. Moreover, the proposed sentiment word embedding models outperform traditional word vectors on both Chinese and English corpora.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127676616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Wang, Yu Hong, Kai Wang, Jianmin Yao, Qiaoming Zhu
{"title":"Correlation analysis between social network content and query intention","authors":"Jian Wang, Yu Hong, Kai Wang, Jianmin Yao, Qiaoming Zhu","doi":"10.1109/ialp.2015.7451544","DOIUrl":"https://doi.org/10.1109/ialp.2015.7451544","url":null,"abstract":"","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117339644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The annotation study of grammar points for TCSL","authors":"Xiao-ru Tan, Lijiao Yang","doi":"10.1109/IALP.2015.7451534","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451534","url":null,"abstract":"Corpus annotation is an important research subject in corpus linguistics, however, annotation research for teaching Chinese as a second language (TCSL) is scarce. It is difficult to retrieve data which are suitable for teaching from the general Chinese corpus. To deal with this problem, firstly, this paper proposed the concept of grammar points annotation and discussed the contents and methods of annotating grammar points in the corpus. Secondly, annotated 121 grammar points in 141464 sentences and got 95592 annotated data with semantic and syntactic information. Finally, this paper introduced the application of annotated data in TCSL.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127968074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view learning for emotion detection in code-switching texts","authors":"Sophia Yat-Mei Lee, Zhongqing Wang","doi":"10.1109/IALP.2015.7451539","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451539","url":null,"abstract":"Previous researches have placed emphasis on analyzing emotions in monolingual text, neglecting the fact that emotions are often found in bilingual or code-switching posts in social media. Traditional methods for the identification or classification of emotion fail to accommodate the code-switching content. To address this challenge, in this paper, we propose a multi-view learning framework to learn and detect the emotions through both monolingual and bilingual views. In particular, the monolingual views are extracted from the monolingual text separately, and the bilingual view is constructed with both monolingual and translated text collectively. Empirical studies demonstrate the effectiveness of our proposed approach in detecting emotions in code-switching texts.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133361380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mandarin prosodic word prediction using dependency relationships","authors":"Zhengchen Zhang, Fuxiang Wu, M. Dong, Fu-qiu Zhou","doi":"10.1109/IALP.2015.7451559","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451559","url":null,"abstract":"Previous research demonstrated that the dependency structure of a sentence is helpful for prosodic phrase boundary prediction in mandarin Text-To-Speech systems. However, no experimental results proved that the dependency relations are important to prosodic word boundary detection. Also, most of the published methods use machine learning technologies, which require people to label the prosodic boundaries manually for training purpose. In this paper, we propose a rule based method for prosodic word boundary prediction based on two observations. First, in most of the cases, a prosodic word is a lexical word, or it is a combination of adjacent lexical words. Second, the combination of lexical words relies on semantic relationships. The dependency tree of a sentence can describe the semantic relations between words. Hence, we combine adjacent words which have dependent relationships into a prosodic word. Some other restrictions are added to fine-tune the method. Experimental results demonstrate that the method achieved 0.918 and 0.901 on two corpora in terms of F-score.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121231498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A.-Yeong Kim, Hyun-Je Song, Seong-Bae Park, Sang-Jo Lee
{"title":"A re-ranking model for dependency parsing with knowledge graph embeddings","authors":"A.-Yeong Kim, Hyun-Je Song, Seong-Bae Park, Sang-Jo Lee","doi":"10.1109/IALP.2015.7451560","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451560","url":null,"abstract":"Re-ranking models of parse trees have been focused on re-ordering parse trees with a syntactic view. However, also a semantic view should be considered in re-ranking parse trees, because the fact that a word pair has a dependency implies that the pair has both syntactic and semantic relations. This paper proposes a re-ranking model for dependency parsing based on a combination of syntactic and semantic plausibilities of dependencies. The syntactic probability is used as a syntactic plausibility of a parse tree, and a knowledge graph embedding is adopted to represent its semantic plausibility. The knowledge graph embedding allows the semantic plausibility of parse trees to be expressed effectively with ease. The experiments on the standard Penn Treebank corpus prove that the proposed model improves the base parser regardless of the number of candidate parse trees.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115371194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine translation from Japanese and French to Vietnamese, the difference among language families","authors":"T. Do, M. Utiyama, E. Sumita","doi":"10.1109/IALP.2015.7451521","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451521","url":null,"abstract":"Although Vietnamese is spoken language of more than 90 million people in the world (in 2014), Vietnamese language is still considered as a low-resourced language. Vietnamese NLP still lacks of resources for text and speech processing, especially research on machine translation for Vietnamese is very rare. This paper presents our first attempt to collect and construct French-Vietnamese and Japanese-Vietnamese statistical machine translation systems. These two different languages, French and Japanese, are less focused in Vietnamese-related machine translation research. The differences between these two languages in comparison with Vietnamese can bring out interesting observations.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114471268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}