{"title":"Corpus for the legal information processing system (CLIPS): A Chinese legal corpus annotated with discourse information","authors":"Hong Wang, Yunfeng Ge","doi":"10.1109/IALP.2017.8300536","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300536","url":null,"abstract":"The paper presents the Corpus for the Legal Information Processing System, a corpus annotated with discourse relations based on Discourse Information Theory (DIT) that takes account of both macro- and micro-structures at discourse level. The information units and information elements in Chinese legal discourses are firstly characterized, and then a 16-valued classification of information units for macro-structure of discourse relations and a 25-valued classification of information elements for micro-structure of discourse relations are introduced. The paper also describes how the annotation strategy procedure is designed and the annotation conduction based on the above characterization.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123728341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information entropy-informed sentence representation for question classification","authors":"Jingyang Gao, Miao Li, Lei Chen, Jinhua Du, R. Ma","doi":"10.1109/IALP.2017.8300550","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300550","url":null,"abstract":"The traditional question classification methods generally employ a large number of features extracted from labelled questions to achieve good classification result However, the high dimensionality of the feature space may lead to a higher training cost In this paper, we propose a novel and effective question semantic representation-based method for question classification, avoiding the complicated feature extraction process. We first exploit the neural network-based language model to learn the distributed representation of words which can capture the semantic relations between words. We then introduce the information entropy to measure the importance of the word to question classification, i.e. use the information entropy to adjust the weight of the word. Subsequently, the question vectors are fed into an SVM classifier as semantic features to obtain the classification result Experimental results demonstrate the effectiveness of our method with the improvement of 3.62% on open domain dataset and 6.50% on agricultural dataset over baseline.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115781794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring semantic content to user profiling for user cluster-based collaborative point-of-interest recommender system","authors":"Yuhuan Xiu, Man Lan, Yuanbin Wu, Jun Lang","doi":"10.1109/IALP.2017.8300595","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300595","url":null,"abstract":"Personalized recommender systems have become increasingly popular in recent years, as they have the ability to make appropriate choices for each active user. Collaborative filtering (CF) is the most successful and widely used technique in recommender systems, which aims at discovering similar users or items based on the history user rating records, i.e., user-item matrix. However, CF may not generate good recommendations when user-item matrix is very sparse. To address this problem, we explore the property category and semantic content to reduce the amount of items, which lead to more accurate performance when estimating user similarity. In addition, since the amount of users is quite huge, we first profile similar users with the aid of clustering algorithm before recommendation. Then, for each active user, the CF recommender system returns top recommendations from the narrow-down cluster the same as the active user by calculating user similarity with the help of item semantic information. The experiments have been performed on the benchmark dataset in NLPCC 2017 to recommend point-of-interest (POI) for each active user. The comparative results demonstrate that our proposed model outperforms the two baselines (i.e., a user-based CF system and an item-based CF system).","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122170629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer learning for children's speech recognition","authors":"R. Tong, Lei Wang, B. Ma","doi":"10.1109/IALP.2017.8300540","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300540","url":null,"abstract":"Children's speech processing is more challenging than that of adults due to lacking of large scale children's speech corpora. With the developing of the physical speech organ, high inter speaker and intra speaker variabilities are observed in children's speech. On the other hand, data collection on children is difficult as children usually have short attention span and their language proficiency is limited. In this paper, we propose to improve children's automatic speech recognition performance with transfer learning technique. We compare two transfer learning approaches in enhancing children's speech recognition performance with adults' data. The first method is to perform acoustic model adaptation on the pre-trained adult model. The second is to train acoustic model with deep neural network based multi-task learning approach: the adults' and children's acoustic characteristics are learnt jointly in the shared hidden layers, while the output layers are optimized with different speaker groups. Our experiment results show that both transfer learning approaches are effective in transferring rich phonetic and acoustic information from adults' model to children model. The multi-task learning approach outperforms the acoustic adaptation approach. We further show that the speakers' acoustic characteristics in languages can also benefit the target language under the multi-task learning framework.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115934204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Divya Sai Jitta, Khyathi Raghavi Chandu, Harsha Pamidipalli, R. Mamidi
{"title":"“nee intention enti?” towards dialog act recognition in code-mixed conversations","authors":"Divya Sai Jitta, Khyathi Raghavi Chandu, Harsha Pamidipalli, R. Mamidi","doi":"10.1109/IALP.2017.8300589","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300589","url":null,"abstract":"Code-Mixing (CM) is a very commonly observed mode of communication in a multilingual configuration. The trends of using this newly emerging language has its effect as a culling option especially in platforms like social media. This becomes particularly important in the context of technology and health, where expressing the upcoming advancements is difficult in native language. Despite the change of such language dynamics, current dialog systems cannot handle a switch between languages across sentences and mixing within a sentence. Everyday conversations are fabricated in this mixed language and analyzing dialog acts in this language is very essential in further advancements of making interaction with personal assistants more natural. The problem is further compounded with crossing the script barriers in code-mixing. In this paper we take the first step towards understanding code-mixing in dialog processing, by recognizing dialog act (intention) of the code-mixed utterance. Considering the dearth of resources in code-mixed languages, we design our current system using only wordlevel resources such as language identification, transliteration and lexical translation. Our best performing system is HMM based with an F-score of 76.67.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129878463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experimental research of mandarin diphthongs produced by uyghur learners","authors":"Yultuz Rapkat, Glnur Arkin, A. Hamdulla","doi":"10.1109/IALP.2017.8300546","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300546","url":null,"abstract":"From the perspective of experimental phonetics, this paper makes an acoustic comparison analysis of the diphthongs Uyghur and chinese college students, and examines the situation of Uyghur students' acquisition of Chinese Mandarin diphthongs. A total of 132 samples (including 9 diphthongs) are extracted from the recorded corpus, and the formants of the vowel are statistically analyzed. The characteristics and the distributions of the formants are analyzed to investigate the acoustic characteristics. Finally, combined with the experimental results, the Uyghur students' acquisition of diphthongs will be further discussed and analysed. The purpose of this paper is to understand the Uyghur college students' acquisition of Chinese Mandarin diphthongs tracks and to provide the correct reference data for the Computer Assisted Language Learning System of Uyghur Learning Chinese Mandarin.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Tibetan-Chinese bilingual entities from wikipedia","authors":"Tao Jiang, Hongzhi Yu, Xiangzhen He, Xianghe Meng","doi":"10.1109/IALP.2017.8300534","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300534","url":null,"abstract":"Entity translation pairs play an important role in NLP applications, such as cross language information retrieval and machine translation. The named entity and domain entity are key factors that affect the performance of the system. However, the entity translations can hardly be found in the present bilingual dictionary or parallel corpus. There are lots of Tibetan new neologisms and named entities in Tibetan Wikipedia, and this paper proposes a new method to automatically mining method of Tibetan and Chinese bilingual entity translation from Wikipedia based on the language interlink and page feature. We construct an extract pattern of Tibetan and Chinese entity translation pairs gained from the previous work, and adopt multi-feature candidate translation pairs to distinguish the selection model. The results verify that the entity translation mining method can achieve high accuracy.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131853881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain independent keyword identification for question answering","authors":"Prathyusha Jwalapuram, R. Mamidi","doi":"10.1109/IALP.2017.8300554","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300554","url":null,"abstract":"In this paper, we look at domain independent keyword identification for natural language queries using statistical methods. We took queries supplemented by only their dependency tags (Stanford Parser) and part-of-speech tags (Stanford POS tagger) and labeled the keywords. We then delexicalised the training data, and used the Conditional Random Fields algorithm to learn these labels. We used the queries created by [1] in the course management domain for training, and tested our model on the queries of three domains: course management, library and the GeoQueries250 dataset and report fairly high accuracies of 90.65%, 83.19% and 97.13% respectively, making our model a truly domain independent and highly accurate keyword identifier.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"340 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133038614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using topic analysis techniques to support comprehensive research paper searches","authors":"S. Fukuda, Yoichi Tomiura","doi":"10.1109/IALP.2017.8300606","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300606","url":null,"abstract":"In an academic paper search to confirm the originality of a user's research, it is important that the search returns comprehensive results relevant to the user's information need. To achieve comprehensive search results, users often relax initially restrictive search formula by adding synonyms and expressions similar to the search words with operator OR, and/or replacing AND with OR operations. However, it is difficult to anticipate all the terms that authors of relevant papers might have used. In addition, the replacement of AND with OR in search phrases can return a large number of unrelated papers. To overcome these issues, we propose a research paper search method based on topic analysis, which uses Boolean search based on the topics assigned to the search words in the search formula and the abstracts that contain any search word. Our method considers synonyms and expressions similar to the search words, which a user might not anticipate, while limiting the number of papers unrelated to the information need in the search result. To investigate the effectiveness of our method, we conducted experiments using the NTCIR-1 and 2 datasets, and confirmed that our method shows a reduction effect on unrelated papers, while maintaining high coverage.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133352852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rehmutulla Memet, Mewlude Nijat, Gulnigar Mahmut, A. Hamdulla
{"title":"A rule and statistical modeling based stem extraction method for kazakh words","authors":"Rehmutulla Memet, Mewlude Nijat, Gulnigar Mahmut, A. Hamdulla","doi":"10.1109/IALP.2017.8300586","DOIUrl":"https://doi.org/10.1109/IALP.2017.8300586","url":null,"abstract":"The Kazakh is one of the agglutinative language with more complicated morphological changes. Kazak stem and affix extraction have important significance for Kazakh information processing. In this paper, according to the morphological structure of Kazakh words, we applied a method to stem extraction, which is combined the lexical rules with statistical model. The stem extraction is carried out by using prefix dictionary, suffix dictionary, stem dictionary, statistical model dictionary and the rule base. Experimental results show that, in the statistical model, the method to extract the stem by using part of speech features is effective, in that, the word level accuracy and the stem level accuracy of this method reached 0.93% and 76.74% respectively.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122618985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}