{"title":"Combine multi-features with deep learning for answer selection","authors":"Yuqing Zheng, Chenghe Zhang, Dequan Zheng, Feng Yu","doi":"10.1109/IALP.2017.8300553","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300553","url":null,"abstract":"Answer selection is an important subtask in open-domain question answering (QA) system, which mainly models for question and answer pairs. In this paper, we first develop a basic framework based on bidirectional long short term memory (Bi-LSTM), and then we extract lexical and topic features in question and answer respectively, finally, we append these features to Bi-LSTM models. Our models experiment on WikiQA dataset, Experimental results show that our models get a slight improvement compared to other published state of the art results.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133724823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CBOS: Continuos bag of sentences for learning sentence embeddings","authors":"Ye Yuan, Yue Zhang","doi":"10.1109/IALP.2017.8300558","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300558","url":null,"abstract":"There has been recent work learning distributed sentence representations, which utilise neighbouring sentences as context for learning the embedding of a sentence. The setting is reminiscent of training word embeddings, yet no work has reported a baseline using the same training objective as learning word vectors. We fill this gap by empirically investigating the use of a Continuous Bag-of-Word (CBOW) objective, predicting the current sentence using its context sentences. We name this method a Continuous Bag-of-Sentences (CBOS) method. Results on standard benchmark show that CBOS is a highly competitive baseline for training sentence embeddings, outperforming most existing methods for text similarity measurement.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124424336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Liu, Si Li, Jianbo Zhao, Zuyi Bao, Xiaopeng Bai
{"title":"Chinese teaching material readability assessment with contextual information","authors":"Hao Liu, Si Li, Jianbo Zhao, Zuyi Bao, Xiaopeng Bai","doi":"10.1109/IALP.2017.8300547","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300547","url":null,"abstract":"Readability of an article indicates its level in terms of reading comprehension in general. Readability assessment is a process that measures the reading level of a piece of text, which can help in finding reading materials suitable for readers. In this paper, we aim to evaluate the readability about the Chinese teaching material aimed at second language (L2) learners. We introduce the neural network models to the readability assessment task for the first time. In order to capture the contextual information for readability assessment, we employ Convolutional Neural Network (CNN) to capture hidden local features. Then we use bi-directional Long Short-Term Memory Networks (bi-LSTM) neural network to combine the past and future information together. Experiment results show that our model achieves competitive performance.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117030332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hiencor: On mining of a hi-en general purpose parallel corpus from the web","authors":"Arjun Das, Utpal Garain, Ravindra Kumar, Apurbalal Senapati","doi":"10.1109/IALP.2017.8300587","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300587","url":null,"abstract":"This paper presents a language independent and simple methodology to mine bilingual parallel corpus from the web. In particular, we extract parallel corpus for the Hindi-English (Hi-En) language pair from web pages which are previously unexplored. Candidate websites containing Hindi and English pages are identified by using a list of Hindi stop words to the system. A small set of manually generated patterns and a state of the art sentence aligner is then used to extract Hindi-English parallel corpus from these candidate websites. The quality of the mined parallel corpus is also demonstrated empirically in Hindi-English machine translation task.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117269749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Study on the aspirated characteristics of Chinese Mandarin consonant","authors":"Shiliang Lyu, Luxin Zhou","doi":"10.1109/IALP.2017.8300577","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300577","url":null,"abstract":"The airflow plays an important role in the pronunciation process of Chinese mandarin consonant, mainly in pronunciation methods. On account of the different pronunciation methods, aspirated and unaspirated are two opposite characteristics. It is generally believed that plosives and affricates have obvious aspirated opposites but fricatives, nasals and laterals don't have. In previous studies, there was less research on consonant airflow. In this paper, experimental phonetics method is utilized to research mandarin consonant initials. By analyzing the airflow and voice signal parameters collected by airflow barometer, the airflow parameters of the consonants with different articulation during the pronunciation is obtained, and the airflow characteristic of mandarin consonants is concluded.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122126471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaolin Zhu, Yating Yang, Xiao Li, Tonghai Jiang, Lei Wang, Xi Zhou, Chenggang Mi
{"title":"Domain adaption based on lda and word embedding in SMT","authors":"Shaolin Zhu, Yating Yang, Xiao Li, Tonghai Jiang, Lei Wang, Xi Zhou, Chenggang Mi","doi":"10.1109/IALP.2017.8300561","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300561","url":null,"abstract":"Current methods about domain adaption in SMT mostly assume that a small in-domain sample is need at training time. However, the fact target domain may not be known at training time so that it may not satisfy the fact translation or is far away from user needs. We instead propose a more suitable method to avoid this situation. Our methods mainly contain two sections (1) Firstly, we use word embedding and LDA model to divide the training corpus into some similar semantic subdomains. (2) Secondly, for an actual source sentences we can select a more suitable translation system by semantic clues. We implement experiments on two language pairs. We can observe consistent improvements over three baselines.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125950261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the analysis and evaluation of prosody conversion techniques","authors":"Berrak Sisman, Grandee Lee, Haizhou Li, K. Tan","doi":"10.1109/IALP.2017.8300542","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300542","url":null,"abstract":"Voice conversion is a process of modifying the characteristics of source speaker such as spectrum or/and prosody, to sound as if it was spoken by another speaker. In this paper, we study the evaluation of prosody transformation, in particular, the evaluation of Fundamental Frequency (F0) conversion. F0 is an essential prosody feature that should be taken care of in a compressive voice conversion framework. So far, the evaluation of the converted prosody features is performed mainly by looking at Pearson Correlation Coefficient and Root Mean Square Error (RMSE). Unfortunately, these techniques do not explicitly measure the F0 alignment between the source and target signals. We believe that an evaluation measure that takes into account the time alignment of F0 is needed to provide a new perspective. Therefore, in this paper, we study a new technique to assess the accuracy of prosody transformation. In our experiments with different prosody transformation techniques, we report that the proposed evaluation approach achieves consistent results with the baseline evaluation metrics.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132034334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recursive annotations for attention-based neural machine translation","authors":"S. Ye, Wu Guo","doi":"10.1109/IALP.2017.8300570","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300570","url":null,"abstract":"The last few years have witnessed the success of attention-based Neural Machine Translation (NMT), and many of variant models have been used to improve the performance. Most of the proposed attention-based NMT models encode the source sentence into a sequence of annotations which are kept fixed for the following steps. In this paper, we conjecture that the use of fixed annotations is the bottleneck in improving the performance ofconventional attention-based NMT. To tackle this shortcoming, we propose a novel model for attention-based NMT, which is intended to update the source annotations recursively when generating the target word at each time step. Experimental results show that the proposed approach achieves significant performance improvement over multiple test sets.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126317591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding explicit arithmetic word problems and explicit plane geometry problems using syntax-semantics models","authors":"Xinguo Yu, Wenbin Gan, Mingshu Wang","doi":"10.1109/IALP.2017.8300590","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300590","url":null,"abstract":"This paper presents two algorithms for understanding explicit arithmetic word problems (EAWPs) and explicit plane geometry problems (EPGPs) following the sharing approach, respectively. This approach proposed in this paper models understanding math problems as a problem of relation extraction, instead of as the problem of understanding the semantics of natural language. Then it further proposes a syntax-semantics (S2) model method to extract math relations. The S2 model method is very effective in that only 116 models can extract most of relations in EAWPs and that only 48 models can extract most of relations in EPGP texts. The experimental results show that the proposed algorithms can understand EAWPs and EPGPs very well.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122226599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting disease-symptom relationships from health question and answer forum","authors":"Christian Halim, A. Wicaksono, M. Adriani","doi":"10.1109/IALP.2017.8300552","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300552","url":null,"abstract":"In this paper, we address the problem of automatically extracting disease-symptom relationships from health question-answer forums due to its usefulness for medical question answering system. To cope with the problem, we divide our main task into two subtasks since they exhibit different challenges: (1) disease-symptom extraction across sentences, (2) disease-symptom extraction within a sentence. For both subtasks, we employed machine learning approach leveraging several hand-crafted features, such as syntactic features (i.e., information from part-of-speech tags) and pre-trained word vectors. Furthermore, we basically formulate our problem as a binary classification task, in which we classify the \"indicating\" relation between a pair of Symptom and Disease entity. To evaluate the performance, we also collected and annotated corpus containing 463 pairs of question-answer threads from several Indonesian health consultation websites. Our experiment shows that, as our expected, the first subtask is relatively more difficult than the second subtask. For the first subtask, the extraction of disease-symptom relation only achieved 36% in terms of F1 measure, while the second one was 76%. To the best of our knowledge, this is the first work addressing such relation extraction task for both \"across\" and \"within\" sentence, especially in Indonesia.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115269306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}