NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699708
Haizhou Li, A. Kumaran, Min Zhang, V. Pervouchine
{"title":"Whitepaper of NEWS 2009 Machine Transliteration Shared Task","authors":"Haizhou Li, A. Kumaran, Min Zhang, V. Pervouchine","doi":"10.3115/1699705.1699708","DOIUrl":"https://doi.org/10.3115/1699705.1699708","url":null,"abstract":"Transliteration is defined as phonetic translation of names across languages. Transliteration of Named Entities (NEs) is necessary in many applications, such as machine translation, corpus alignment, cross-language IR, information extraction and automatic lexicon acquisition. All such systems call for high-performance transliteration, which is the focus of the shared task in the NEWS 2009 workshop. The objective of the shared task is to promote machine transliteration research by providing a common benchmarking platform for the community to evaluate the state-of-the-art technologies.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"53 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120885615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699712
Sittichai Jiampojamarn, Aditya Bhargava, Qing Dou, Kenneth Dwyer, Grzegorz Kondrak
{"title":"DirecTL: a Language Independent Approach to Transliteration","authors":"Sittichai Jiampojamarn, Aditya Bhargava, Qing Dou, Kenneth Dwyer, Grzegorz Kondrak","doi":"10.3115/1699705.1699712","DOIUrl":"https://doi.org/10.3115/1699705.1699712","url":null,"abstract":"We present DirecTL: an online discriminative sequence prediction model that employs a many-to-many alignment between target and source. Our system incorporates input segmentation, target character prediction, and sequence modeling in a unified dynamic programming framework. Experimental results suggest that DirecTL is able to independently discover many of the language-specific regularities in the training data.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128498069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699724
D. Yang, Paul R. Dixon, Yi-Cheng Pan, T. Oonishi, Masanobu Nakamura, S. Furui
{"title":"Combining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration","authors":"D. Yang, Paul R. Dixon, Yi-Cheng Pan, T. Oonishi, Masanobu Nakamura, S. Furui","doi":"10.3115/1699705.1699724","DOIUrl":"https://doi.org/10.3115/1699705.1699724","url":null,"abstract":"This paper describes our system for \"NEWS 2009 Machine Transliteration Shared Task\" (NEWS 2009). We only participated in the standard run, which is a direct orthographical mapping (DOP) between two languages without using any intermediate phonemic mapping. We propose a new two-step conditional random field (CRF) model for DOP machine transliteration, in which the first CRF segments a source word into chunks and the second CRF maps the chunks to a word in the target language. The two-step CRF model obtains a slightly lower top-1 accuracy when compared to a state-of-the-art n-gram joint source-channel model. The combination of the CRF model with the joint source-channel leads to improvements in all the tasks. The official result of our system in the NEWS 2009 shared task confirms the effectiveness of our system; where we achieved 0.627 top-1 accuracy for Japanese transliterated to Japanese Kanji(JJ), 0.713 for English-to-Chinese(E2C) and 0.510 for English-to-Japanese Katakana(E2J).","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129802239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699729
S. Reddy, Sonjia Waxmonsky
{"title":"Substring-based Transliteration with Conditional Random Fields","authors":"S. Reddy, Sonjia Waxmonsky","doi":"10.3115/1699705.1699729","DOIUrl":"https://doi.org/10.3115/1699705.1699729","url":null,"abstract":"Motivated by phrase-based translation research, we present a transliteration system where characters are grouped into substrings to be mapped atomically into the target language. We show how this substring representation can be incorporated into a Conditional Random Field model that uses local context and phonemic information.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116367302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699714
Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa
{"title":"Machine Transliteration using Target-Language Grapheme and Phoneme: Multi-engine Transliteration Approach","authors":"Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa","doi":"10.3115/1699705.1699714","DOIUrl":"https://doi.org/10.3115/1699705.1699714","url":null,"abstract":"This paper describes our approach to \"NEWS 2009 Machine Transliteration Shared Task.\" We built multiple transliteration engines based on different combinations of two transliteration models and three machine learning algorithms. Then, the outputs from these transliteration engines were combined using re-ranking functions. Our method was applied to all language pairs in \"NEWS 2009 Machine Transliteration Shared Task.\" The official results of our standard runs were ranked the best for four language pairs and the second best for three language pairs.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"35 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120896455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699722
E. Aramaki, Takeshi Abekawa
{"title":"Fast Decoding and Easy Implementation: Transliteration as Sequential Labeling","authors":"E. Aramaki, Takeshi Abekawa","doi":"10.3115/1699705.1699722","DOIUrl":"https://doi.org/10.3115/1699705.1699722","url":null,"abstract":"Although most of previous transliteration methods are based on a generative model, this paper presents a discriminative transliteration model using conditional random fields. We regard character(s) as a kind of label, which enables us to consider a transliteration process as a sequential labeling process. This approach has two advantages: (1) fast decoding and (2) easy implementation. Experimental results yielded competitive performance, demonstrating the feasibility of the proposed approach.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131258936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699730
Xue Jiang, Le Sun, Dakun Zhang
{"title":"A Syllable-based Name Transliteration System","authors":"Xue Jiang, Le Sun, Dakun Zhang","doi":"10.3115/1699705.1699730","DOIUrl":"https://doi.org/10.3115/1699705.1699730","url":null,"abstract":"This paper describes the name entity transliteration system which we conducted for the \"NEWS2009 Machine Transliteration Shared Task\" (Li et al 2009). We get the transliteration in Chinese from an English name with three steps. We syllabify the English name into a sequence of syllables by some rules, and generate the most probable Pinyin sequence with the mapping model of English syllables to Pinyin (EP model), then we convert the Pinyin sequence into a Chinese character sequence with the mapping model of Pinyin to characters (PC model). And we get the final Chinese character sequence. Our system achieves an ACC of 0.498 and a Mean F-score of 0.786 in the official evaluation result.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129191635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699735
D. Zelenko
{"title":"Combining MDL Transliteration Training with Discriminative Modeling","authors":"D. Zelenko","doi":"10.3115/1699705.1699735","DOIUrl":"https://doi.org/10.3115/1699705.1699735","url":null,"abstract":"We present a transliteration system that introduces minimum description length training for transliteration and combines it with discriminative modeling. We apply the proposed approach to transliteration from English to 8 non-Latin scripts, with promising results.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122237554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699727
Mitesh M. Khapra, P. Bhattacharyya
{"title":"Improving Transliteration Accuracy Using Word-Origin Detection and Lexicon Lookup","authors":"Mitesh M. Khapra, P. Bhattacharyya","doi":"10.3115/1699705.1699727","DOIUrl":"https://doi.org/10.3115/1699705.1699727","url":null,"abstract":"We propose a framework for transliteration which uses (i) a word-origin detection engine (pre-processing) (ii) a CRF based transliteration engine and (iii) a re-ranking model based on lexicon-lookup (post-processing). The results obtained for English-Hindi and English-Kannada transliteration show that the preprocessing and post-processing modules improve the top-1 accuracy by 7.1%.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134210103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@IJCNLPPub Date : 2009-08-07DOI: 10.3115/1699705.1699716
Manoj Kumar Chinnakotla, O. Damani
{"title":"Experiences with English-Hindi, English-Tamil and English-Kannada Transliteration Tasks at NEWS 2009","authors":"Manoj Kumar Chinnakotla, O. Damani","doi":"10.3115/1699705.1699716","DOIUrl":"https://doi.org/10.3115/1699705.1699716","url":null,"abstract":"We use a Phrase-Based Statistical Machine Translation approach to Transliteration where the words are replaced by characters and sentences by words. We employ the standard SMT tools like GIZA++ for learning alignments and Moses for learning the phrase tables and decoding. Besides tuning the standard SMT parameters, we focus on tuning the Character Sequence Model (CSM) related parameters like order of the CSM, weight assigned to CSM during decoding and corpus used for CSM estimation. Our results show that paying sufficient attention to CSM pays off in terms of increased transliteration accuracies.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130976042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}