2019 International Conference on Asian Language Processing (IALP)最新文献

筛选
英文 中文
Comprehension correlates of the occurrence and deletion of “de” in Mandarin “N1 (de) N2” structures 普通话“N1 (de) N2”结构中“得”的出现和缺失与理解相关
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037704
Junyuan Zhao, Junru Wu
{"title":"Comprehension correlates of the occurrence and deletion of “de” in Mandarin “N1 (de) N2” structures","authors":"Junyuan Zhao, Junru Wu","doi":"10.1109/IALP48816.2019.9037704","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037704","url":null,"abstract":"Based on corpus materials and on-line semantic judgment surveys, this paper investigates the comprehension differences related to the occurrence and deletion of “de” in the Mandarin “N1 (de) N2” structure. By applying PCA and LME modellings on a set of semantic survey data, this study provides a multi-level database of semantic measurements for a set of Chinese “N1 (de) N2” structures as well as a quantitative analysis regarding the correlation between structure-level and constituent-level semantic features. The research shows that:(1) The “de”-occurring structure is more likely to be interpreted as indefinite than the “de”-deletion structure. (2) Animacy of N1 is positively related to the grammaticality of the “de”-occurring structure, while animacy of N1 is negatively related to the grammaticality of the “de”-deletion structure. The research findings provide evidence for prototype effects in the process of language comprehension. We propose that in natural comprehension, there is a high-animacy bias for N1 regarding the “de”-occurring structure; while a low animacy interpretation for N1 is more prototypical for the “de”-deletion structure. Accordingly, the “de”-occurring structure tends to be interpreted as a possessive, while the “de”-deletion structure is more likely to be interpreted as a modifier-head structure.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133603086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Language Detection in Sinhala-English Code-mixed Data 僧伽罗语-英语码混合数据的语言检测
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037680
Ian Smith, Uthayasanker Thayasivam
{"title":"Language Detection in Sinhala-English Code-mixed Data","authors":"Ian Smith, Uthayasanker Thayasivam","doi":"10.1109/IALP48816.2019.9037680","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037680","url":null,"abstract":"Language identification in text data has become a trending topic due to multiple language usage on the internet and it becomes a difficult task when it comes to bilingual and multilingual communication data processing. Accordingly, this study introduces a methodology to detect Sinhala and English words in code-mixed data and this is the first research done on such scenario at the time of this paper is written. In addition to that, the data set which is used for this research was newly built and published for similar research users. Even though there are well known models to identify Singlish Unicode characters which is a straightforward study; there are no proper language detection models to detect Sinhala words in a sentence which contains English words (code-mixed data). Therefore, this paper presents a language detection model with XGB classifier with 92.1% accuracy and a CRF model with a Fl-score of 0.94 for sequence labeling.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134151982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Research on Chinese Text Error Correction Based on Sequence Model 基于序列模型的中文文本纠错研究
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037666
Jianyong Duan, Yang Yuan, Hao Wang, Xiaopeng Wei, Zheng Tan
{"title":"Research on Chinese Text Error Correction Based on Sequence Model","authors":"Jianyong Duan, Yang Yuan, Hao Wang, Xiaopeng Wei, Zheng Tan","doi":"10.1109/IALP48816.2019.9037666","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037666","url":null,"abstract":"When users input text, it will inevitably produce errors, and with the rapid development and popularization of smart devices, the situation becomes more and more serious. Therefore, text correction has become one of the important research directions in the field of natural language processing. As the grammatical error correction task, in this paper, the error correction process of Chinese text is regarded as the conversion process from wrong sentence to correct sentence. In order to adapt to this task, the (sequence-to-sequence) Seq2Seq model is introduced. The wrong sentence is used as the source sentence, and the correct sentence is used as the target sentence. Supervised training is carried out in units of characters and words. It can be used for correcting errors such as word of homophone, homotype, and near-sound, greatly reducing the artificial participation and expert support of feature extraction, improve model accuracy on specific errors. In order to solve the information loss caused by the conversion of long sequence to fixed length vector, the attention mechanism is introduced into the basic model. After adding the attention mechanism, the model’s accuracy, recall rate and F1 value have been effectively improved.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134174086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A General Procedure for Improving Language Models in Low-Resource Speech Recognition 在低资源语音识别中改进语言模型的一般程序
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037726
Qian Liu, Weiqiang Zhang, Jia Liu, Yao Liu
{"title":"A General Procedure for Improving Language Models in Low-Resource Speech Recognition","authors":"Qian Liu, Weiqiang Zhang, Jia Liu, Yao Liu","doi":"10.1109/IALP48816.2019.9037726","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037726","url":null,"abstract":"It is difficult for a language model (LM) to perform well with limited in-domain transcripts in low-resource speech recognition. In this paper, we mainly summarize and extend some effective methods to make the most of the out-of-domain data to improve LMs. These methods include data selection, vocabulary expansion, lexicon augmentation, multi-model fusion and so on. The methods are integrated into a systematic procedure, which proves to be effective for improving both n-gram and neural network LMs. Additionally, pre-trained word vectors using out-of-domain data are utilized to improve the performance of RNN/LSTM LMs for rescoring first-pass decoding results. Experiments on five Asian languages from Babel Build Packs show that, after improving LMs, 5.4-7.6% relative reduction of word error rate (WER) is generally achieved compared to the baseline ASR systems. For some languages, we achieve lower WER than newly published results on the same data sets.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"52 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114112711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ranking Like Human: Global-View Matching via Reinforcement Learning for Answer Selection 像人一样排名:通过强化学习进行答案选择的全局视图匹配
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037725
Yingxue Zhang, Ping Jian, Ruiying Geng, Yuansheng Song, Fandong Meng
{"title":"Ranking Like Human: Global-View Matching via Reinforcement Learning for Answer Selection","authors":"Yingxue Zhang, Ping Jian, Ruiying Geng, Yuansheng Song, Fandong Meng","doi":"10.1109/IALP48816.2019.9037725","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037725","url":null,"abstract":"Answer Selection (AS) is of great importance for open-domain Question Answering (QA). Previous approaches typically model each pair of the question and the candidate answers independently. However, when selecting correct answers from the candidate set, the question is usually too brief to provide enough matching information for the right decision. In this paper, we propose a reinforcement learning framework that utilizes the rich overlapping information among answer candidates to help judge the correctness of each candidate. In particular, we design a policy network, whose state aggregates both the question-candidate matching information and the candidate-candidate matching information through a global-view encoder. Experiments on the benchmark of WikiQA and SelQA demonstrate that our RL framework substantially improves the ranking performance.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121986781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Separate Answer Decoding for Multi-class Question Generation 分门别类问题生成的答案解码
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037710
Kaili Wu, Yu Hong, Mengmeng Zhu, Hongxuan Tang, Min Zhang
{"title":"Separate Answer Decoding for Multi-class Question Generation","authors":"Kaili Wu, Yu Hong, Mengmeng Zhu, Hongxuan Tang, Min Zhang","doi":"10.1109/IALP48816.2019.9037710","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037710","url":null,"abstract":"Question Generation (QG) aims to automati-nerate questions by understanding the semantics of source sentences and target answers. Learning to generate diverse questions for one source sentence with different target answers is important for the QG task. Despite of the success of existing state-of-the-art approaches, they are designed to merely generate a unique question for a source sentence. The diversity of answers fail to be considered in the research activities. In this paper, we present a novel QG model. It is designed to generate different questions toward a source sentence on the condition that different answers are regarded as the targets. Pointer-Generator Network(PGN) is used as the basic architecture. On the basis, a separate answer encoder is integrated into PGN to regulate the question generating process, which enables the generator to be sensitive to attentive target answers. To ease the reading, we name our model as APGN for short in the following sections of the paper. Experimental results show that APGN outperforms the state-of-the-art on SQuAD split-l dataset. Besides, it is also proven that our model effectively improves the accuracy of question word prediction, which leads to the generation of appropriate questions.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128241331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying the Use of English Words in Urdu News-Stories 乌尔都语新闻故事中英语词汇使用的量化
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037734
Mehtab Alam Syed, Arif Ur Rahman, Muzammil Khan
{"title":"Quantifying the Use of English Words in Urdu News-Stories","authors":"Mehtab Alam Syed, Arif Ur Rahman, Muzammil Khan","doi":"10.1109/IALP48816.2019.9037734","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037734","url":null,"abstract":"The vocabulary of Urdu language is a mixture of many other languages including Farsi, Arabic and Sinskrit. Though, Urdu is the national language of Pakistan, English has the status of official language of Pakistan. The use of English words in spoken Urdu as well as documents written in Urdu is increasing with the passage of time.The automatic detection of English words written using Urdu script in Urdu text is a complicated task. This may require the use of advanced machine/deep learning techniques. However, the lack of initial work for developing a fully automatic system makes it a more challenging task. The current paper presents the result of an initial work which may lead to the development of an approach which may detect any English word written Urdu text. First, an approach is developed to preserve Urdu stories from online sources in a normalized format. Second, a dictionary of English words transliterated into Urdu was developed. The results show that there can be different categories of words in Urdu text including transliterated words, words originating from English and words having exactly similar pronunciation but different meaning.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128558297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Effective Data Augmentation Approaches to End-to-End Task-Oriented Dialogue 端到端任务导向对话的有效数据增强方法
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037690
Jun Quan, Deyi Xiong
{"title":"Effective Data Augmentation Approaches to End-to-End Task-Oriented Dialogue","authors":"Jun Quan, Deyi Xiong","doi":"10.1109/IALP48816.2019.9037690","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037690","url":null,"abstract":"The training of task-oriented dialogue systems is often confronted with the lack of annotated data. In contrast to previous work which augments training data through expensive crowd-sourcing efforts, we propose four different automatic approaches to data augmentation at both the word and sentence level for end-to-end task-oriented dialogue and conduct an empirical study on their impact. Experimental results on the CamRest676 and KVRET datasets demonstrate that each of the four data augmentation approaches is able to obtain a significant improvement over a strong baseline in terms of Success F1 score and that the ensemble of the four approaches achieves the state-of-the-art results in the two datasets. In-depth analyses further confirm that our methods adequately increase the diversity of user utterances, which enables the end-to-end model to learn features robustly.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117243646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Japanese Particle Error Correction employing Classification Model 用分类模型修正日语粒子误差
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037699
Youichiro Ogawa, Kazuhide Yamamoto
{"title":"Japanese Particle Error Correction employing Classification Model","authors":"Youichiro Ogawa, Kazuhide Yamamoto","doi":"10.1109/IALP48816.2019.9037699","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037699","url":null,"abstract":"We present a grammatical error correction system for Japanese particles based on the classification method. We define a confusion set of the particles for detection of particle errors and prediction of the correct word. Our method can handle not only substitutions but also insertions and deletions. For building the training data, we used two datasets: a large amount of native language data and corrected learners' sentences. That is, we did not require a parallel corpus of learners. We show the results for Japanese particle error correction on the NAIST Goyo corpus, evaluated by the MaxMatch $(M^{2})$ score. In addition, we analyze the effect of percentage changes in deletion labels while building the training data and analyze the prediction probability threshold at correction. Our best model achieved 46.4 $F_{0.5}$.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121631219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Study on the Robustness of Pitch Range Estimation from Brief Speech Segments 基于简短语音片段的基音范围估计鲁棒性研究
2019 International Conference on Asian Language Processing (IALP) Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037713
Wenjie Peng, Kaiqi Fu, Wei Zhang, Yanlu Xie, Jinsong Zhang
{"title":"A Study on the Robustness of Pitch Range Estimation from Brief Speech Segments","authors":"Wenjie Peng, Kaiqi Fu, Wei Zhang, Yanlu Xie, Jinsong Zhang","doi":"10.1109/IALP48816.2019.9037713","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037713","url":null,"abstract":"Pitch range estimation from brief speech segments is important for many tasks like automatic speech recognition. To address this issue, previous studies have proposed to utilize deep-learning-based models to estimate pitch range with spectrum information as input [1–2]. They demonstrated it could still achieve reliable estimation results when speech segment is as brief as 300ms. In this work, we further investigate the robustness of this method. We take the following situation into account: 1) increasing the number of speakers for model training hugely; 2) second-language(L2) speech data; 3) the influence of monosyllabic utterances with different tones. We conducted experiments accordingly. Experimental results showed that: 1) We further improved the accuracy of pitch range estimation after increasing the speakers for model training. 2) The estimation accuracy on the L2 learners is similar to that on the native speakers. 3) Different tonal information has an influence on the LSTM-based model, but this influence is limited compared to the baseline method. These results may contribute to speech systems that demanding pitch features.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132122657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信