{"title":"Transfer Grammar in Tamil-Hindi MT System","authors":"S. L. Devi, Sindhuja Gopalan, R. Ram","doi":"10.1109/IALP.2013.24","DOIUrl":"https://doi.org/10.1109/IALP.2013.24","url":null,"abstract":"In this paper, we present the work on transfer grammar, one of the most challenging issues in MT, in a bidirectional Tamil-Hindi translation system-Sam park. Transfer grammar between the above languages can be categorized into two levels (1) the structure transfer and (2) lexical level transfer. Tamil and Hindi differ extensively at the clausal construction level and at the verb formation level since Tamil is an agglutinative language and Hindi is not. Transfer grammar described here uses a hybrid approach using CRF a machine learning algorithm and linguistic rules for structure transfer, a rule based approach for word level transfer. We tested the approach in the Sam park system using web data and the results are encouraging.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131928738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Comparative Research on the Segmentation Strategies of Tibetan Bounded-Variant Forms","authors":"Congjun Long, Caijun Kang, Di Jiang","doi":"10.1109/IALP.2013.75","DOIUrl":"https://doi.org/10.1109/IALP.2013.75","url":null,"abstract":"The segmentation of Tibetan bounded-variant forms (TBVFS) is one of the most foundational tasks in text processing and the segmenting results directly influence the word segmentation, portaging, syntactic parsing and the Named Entity Extraction and so on. At present, the segmenting results are unsatisfactory and cannot be applied in practice. In this article, authors firstly describe the features of TBVFS, their distributions and then test the segmenting results by using two different segmentation strategies and conclude that Statistics-based methods for morpheme position tagging is better than Rule-based methods. If some rules are used to adjust a part of mistaken segmentations in the post processing, this kind of segmentation problem can be resolved.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115292476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Swati Talesara, H. Patil, T. Patel, Hardik B. Sailor, Nirmesh J. Shah
{"title":"A Novel Gaussian Filter-Based Automatic Labeling of Speech Data for TTS System in Gujarati Language","authors":"Swati Talesara, H. Patil, T. Patel, Hardik B. Sailor, Nirmesh J. Shah","doi":"10.1109/IALP.2013.46","DOIUrl":"https://doi.org/10.1109/IALP.2013.46","url":null,"abstract":"Text-to-speech (TTS) synthesizer has been proved to be an aiding tool for many visually challenged people for reading through hearing feedback. There are TTS synthesizers available in English, however, it has been observed that people feel more comfortable in hearing their own native language. Keeping this point in mind, Gujarati TTS synthesizer has been built. This TTS system has been built in Festival speech synthesis framework. Syllable is taken as the basic unit in building Gujarati TTS synthesizer as Indian languages are syllabic in nature. In building the unit-selection based Gujarati TTS system, one requires large Gujarati labeled corpus. The task of labeling is most time-consuming and tedious. This task requires large manual efforts. Therefore, in this work, an attempt has been made to reduce these efforts by automatically generating labeled corpus at syllable-level. To that effect, a Gaussian-based segmentation method has been proposed for automatic segmentation of speech at syllable-level. It has been observed that percentage correctness of labeled data is around 80% for both male and female voice as compared to 70% for group delay-based labeling. In addition, the system built on the proposed approach shows better intelligibility when evaluated by a visually challenged subject. The word error rate is reduced by 5% for Gaussian filter-based TTS system, compared to group delay-based TTS system. Also, 5% increment is observed in correctly synthesized words. The main focus of this work is to reduce the manual efforts required in building TTS system (which are primarily the manual efforts required in labeling speech data) for Gujarati.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121566932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Ming Hsieh, Su-Chu Lin, Jason J. S. Chang, Keh-Jiann Chen
{"title":"Improving Chinese Parsing with Special-Case Probability Re-estimation","authors":"Yu-Ming Hsieh, Su-Chu Lin, Jason J. S. Chang, Keh-Jiann Chen","doi":"10.1109/IALP.2013.54","DOIUrl":"https://doi.org/10.1109/IALP.2013.54","url":null,"abstract":"Syntactic patterns which are hard to be expressed by binary dependent relations need special treatments, since structure evaluations of such constructions are different from general parsing framework. Moreover, these different syntactic patterns (special cases) should be handled with distinct estimated model other than the general one. In this paper, we present a special-case probability re-estimation model (SCM), integrating the general model with an adoptable estimated model in special cases. The SCM model can estimate evaluation scores in specific syntactic constructions more accurately, and is able for adopting different features in different cases. Experiment results show that our proposed model has better performance than the state-of-the-art parser in Chinese.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114338832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}