NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699725
O. Kwong
{"title":"Phonological Context Approximation and Homophone Treatment for NEWS 2009 English-Chinese Transliteration Shared Task","authors":"O. Kwong","doi":"10.3115/1699705.1699725","DOIUrl":"https://doi.org/10.3115/1699705.1699725","url":null,"abstract":"This paper describes our systems participating in the NEWS 2009 Machine Transliteration Shared Task. Two runs were submitted for the English-Chinese track. The system for the standard run is based on graphemic approximation of local phonological context. The one for the non-standard run is based on parallel modelling of sound and tone patterns for treating homophones in Chinese. Official results show that both systems stand in the mid range amongst all participating systems.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115358745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699710
Kevin Knight
{"title":"Automata for Transliteration and Machine Translation","authors":"Kevin Knight","doi":"10.3115/1699705.1699710","DOIUrl":"https://doi.org/10.3115/1699705.1699710","url":null,"abstract":"Automata theory, transliteration, and machine translation (MT) have an interesting and intertwined history. \u0000 \u0000Finite-state string automata theory became a powerful tool for speech and language after the introduction of the ATT furthermore, these machines can be pipelined to attack complex problems like speech recognition. Likewise, n-gram models can be captured by finite-state acceptors, which can be reused across applications. \u0000 \u0000It is possible to mix, match, and compose transducers to flexibly solve all kinds of problems. One such problem is transliteration, which can be modeled as a pipeline of string transformations. MT has also been modeled with transducers, and descendants of the FSM toolkit are now used to implement phrase-based machine translation. Even speech recognizers and MT systems can themselves be composed to deliver speech-to-speech MT. \u0000 \u0000The main rub with finite-state string MT is word re-ordering. Tree transducers offer a natural mechanism to solve this problem, and they have recently been employed with some success. \u0000 \u0000In this talk, we will survey these ideas (and their origins), and we will finish with a discussion of how transliteration and MT can work together.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116278892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699733
Gumwon Hong, Min-Jeong Kim, Do-Gil Lee, Hae-Chang Rim
{"title":"A Hybrid Approach to English-Korean Name Transliteration","authors":"Gumwon Hong, Min-Jeong Kim, Do-Gil Lee, Hae-Chang Rim","doi":"10.3115/1699705.1699733","DOIUrl":"https://doi.org/10.3115/1699705.1699733","url":null,"abstract":"This paper presents a hybrid approach to English-Korean name transliteration. The base system is built on MOSES with enabled factored translation features. We expand the base system by combining with various transliteration methods including a Web-based n-best re-ranking, a dictionary-based method, and a rule-based method. Our standard run and best non-standard run achieve 45.1 and 78.5, respectively, in top-1 accuracy. Experimental results show that expanding training data size significantly contributes to the performance. Also we discover that the Web-based re-ranking method can be successfully applied to the English-Korean transliteration.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123876217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699707
Haizhou Li, A. Kumaran, V. Pervouchine, Min Zhang
{"title":"Report of NEWS 2009 Machine Transliteration Shared Task","authors":"Haizhou Li, A. Kumaran, V. Pervouchine, Min Zhang","doi":"10.3115/1699705.1699707","DOIUrl":"https://doi.org/10.3115/1699705.1699707","url":null,"abstract":"This report documents the details of the Machine Transliteration Shared Task conducted as a part of the Named Entities Workshop (NEWS), an ACL-IJCNLP 2009 workshop. The shared task features machine transliteration of proper names from English to a set of languages. This shared task has witnessed enthusiastic participation of 31 teams from all over the world, with diversity of participation for a given system and wide coverage for a given language pair (more than a dozen participants per language pair). Diverse transliteration methodologies are represented adequately in the shared task for a given language pair, thus underscoring the fact that the workshop may truly indicate the state of the art in machine transliteration in these language pairs. We measure and report 6 performance metrics on the submitted results. We believe that the shared task has successfully achieved the following objectives: (i) bringing together the community of researchers in the area of Machine Transliteration to focus on various research avenues, (ii) Calibrating systems on common corpora, using common metrics, thus creating a reasonable baseline for the state-of-the-art of transliteration systems, and (iii) providing a quantitative basis for meaningful comparison and analysis between various algorithmic approaches used in machine transliteration. We believe that the results of this shared task would uncover a host of interesting research problems, giving impetus to research in this significant research area.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131964729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699732
Rejwanul Haque, Sandipan Dandapat, Ankit K. Srivastava, S. Naskar, Andy Way
{"title":"English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009","authors":"Rejwanul Haque, Sandipan Dandapat, Ankit K. Srivastava, S. Naskar, Andy Way","doi":"10.3115/1699705.1699732","DOIUrl":"https://doi.org/10.3115/1699705.1699732","url":null,"abstract":"This paper presents English---Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framework that enables efficient estimation of these features while avoiding data sparseness problems.We carried out experiments both at character and transliteration unit (TU) level. Position-dependent source context features produce significant improvements in terms of all evaluation metrics.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129256515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699745
Kuniko Saito, Kenji Imamura
{"title":"Tag Confidence Measure for Semi-Automatically Updating Named Entity Recognition","authors":"Kuniko Saito, Kenji Imamura","doi":"10.3115/1699705.1699745","DOIUrl":"https://doi.org/10.3115/1699705.1699745","url":null,"abstract":"We present two techniques to reduce machine learning cost, i.e., cost of manually annotating unlabeled data, for adapting existing CRF-based named entity recognition (NER) systems to new texts or domains. We introduce the tag posterior probability as the tag confidence measure of an individual NE tag determined by the base model. Dubious tags are automatically detected as recognition errors, and regarded as targets of manual correction. Compared to entire sentence posterior probability, tag posterior probability has the advantage of minimizing system cost by focusing on those parts of the sentence that require manual correction. Using the tag confidence measure, the first technique, known as active learning, asks the editor to assign correct NE tags only to those parts that the base model could not assign tags confidently. Active learning reduces the learning cost by 66%, compared to the conventional method. As the second technique, we propose bootstrapping NER, which semi-automatically corrects dubious tags and updates its model.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"362 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121647863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699723
Colin Cherry, Hisami Suzuki
{"title":"NEWS 2009 Machine Transliteration Shared Task System Description: Transliteration with Letter-to-Phoneme Technology","authors":"Colin Cherry, Hisami Suzuki","doi":"10.3115/1699705.1699723","DOIUrl":"https://doi.org/10.3115/1699705.1699723","url":null,"abstract":"We interpret the problem of transliterating English named entities into Hindi or Japanese Katakana as a variant of the letter-to-phoneme (L2P) subtask of text-to-speech processing. Therefore, we apply a re-implementation of a state-of-the-art, discriminative L2P system (Jiampojamarn et al., 2008) to the problem, without further modification. In doing so, we hope to provide a baseline for the NEWS 2009 Machine Transliteration Shared Task (Li et al., 2009), indicating how much can be achieved without transliteration-specific technology. This paper briefly summarizes the original work and our reimplementation. We also describe a bug in our submitted implementation, and provide updated results on the development and test sets.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115157662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699718
Kommaluri Vijayanand
{"title":"Testing and Performance Evaluation of Machine Transliteration System for Tamil Language","authors":"Kommaluri Vijayanand","doi":"10.3115/1699705.1699718","DOIUrl":"https://doi.org/10.3115/1699705.1699718","url":null,"abstract":"Machine Translation (MT) is a science fiction that was converted into reality with the enormous contributions from the MT research community. We cannot expect any text without Named Entities (NE). Such NEs are crucial in deciding the quality of MT. NEs are to be recognized from the text and transliterated accordingly into the target language in order to ensure the quality of MT. In the present paper we present various technical issues encountered during handling the shared task of NE transliteration for Tamil.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116767683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}