NEWS@IJCNLP最新文献

筛选
英文 中文
Phonological Context Approximation and Homophone Treatment for NEWS 2009 English-Chinese Transliteration Shared Task NEWS 2009中英文音译共享任务的语音语境逼近与同音字处理
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699725
O. Kwong
{"title":"Phonological Context Approximation and Homophone Treatment for NEWS 2009 English-Chinese Transliteration Shared Task","authors":"O. Kwong","doi":"10.3115/1699705.1699725","DOIUrl":"https://doi.org/10.3115/1699705.1699725","url":null,"abstract":"This paper describes our systems participating in the NEWS 2009 Machine Transliteration Shared Task. Two runs were submitted for the English-Chinese track. The system for the standard run is based on graphemic approximation of local phonological context. The one for the non-standard run is based on parallel modelling of sound and tone patterns for treating homophones in Chinese. Official results show that both systems stand in the mid range amongst all participating systems.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115358745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automata for Transliteration and Machine Translation 音译和机器翻译的自动机
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699710
Kevin Knight
{"title":"Automata for Transliteration and Machine Translation","authors":"Kevin Knight","doi":"10.3115/1699705.1699710","DOIUrl":"https://doi.org/10.3115/1699705.1699710","url":null,"abstract":"Automata theory, transliteration, and machine translation (MT) have an interesting and intertwined history. \u0000 \u0000Finite-state string automata theory became a powerful tool for speech and language after the introduction of the ATT furthermore, these machines can be pipelined to attack complex problems like speech recognition. Likewise, n-gram models can be captured by finite-state acceptors, which can be reused across applications. \u0000 \u0000It is possible to mix, match, and compose transducers to flexibly solve all kinds of problems. One such problem is transliteration, which can be modeled as a pipeline of string transformations. MT has also been modeled with transducers, and descendants of the FSM toolkit are now used to implement phrase-based machine translation. Even speech recognizers and MT systems can themselves be composed to deliver speech-to-speech MT. \u0000 \u0000The main rub with finite-state string MT is word re-ordering. Tree transducers offer a natural mechanism to solve this problem, and they have recently been employed with some success. \u0000 \u0000In this talk, we will survey these ideas (and their origins), and we will finish with a discussion of how transliteration and MT can work together.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116278892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Hybrid Approach to English-Korean Name Transliteration 英韩姓名音译的混合方法
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699733
Gumwon Hong, Min-Jeong Kim, Do-Gil Lee, Hae-Chang Rim
{"title":"A Hybrid Approach to English-Korean Name Transliteration","authors":"Gumwon Hong, Min-Jeong Kim, Do-Gil Lee, Hae-Chang Rim","doi":"10.3115/1699705.1699733","DOIUrl":"https://doi.org/10.3115/1699705.1699733","url":null,"abstract":"This paper presents a hybrid approach to English-Korean name transliteration. The base system is built on MOSES with enabled factored translation features. We expand the base system by combining with various transliteration methods including a Web-based n-best re-ranking, a dictionary-based method, and a rule-based method. Our standard run and best non-standard run achieve 45.1 and 78.5, respectively, in top-1 accuracy. Experimental results show that expanding training data size significantly contributes to the performance. Also we discover that the Web-based re-ranking method can be successfully applied to the English-Korean transliteration.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123876217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Report of NEWS 2009 Machine Transliteration Shared Task NEWS 2009机器音译共享任务报告
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699707
Haizhou Li, A. Kumaran, V. Pervouchine, Min Zhang
{"title":"Report of NEWS 2009 Machine Transliteration Shared Task","authors":"Haizhou Li, A. Kumaran, V. Pervouchine, Min Zhang","doi":"10.3115/1699705.1699707","DOIUrl":"https://doi.org/10.3115/1699705.1699707","url":null,"abstract":"This report documents the details of the Machine Transliteration Shared Task conducted as a part of the Named Entities Workshop (NEWS), an ACL-IJCNLP 2009 workshop. The shared task features machine transliteration of proper names from English to a set of languages. This shared task has witnessed enthusiastic participation of 31 teams from all over the world, with diversity of participation for a given system and wide coverage for a given language pair (more than a dozen participants per language pair). Diverse transliteration methodologies are represented adequately in the shared task for a given language pair, thus underscoring the fact that the workshop may truly indicate the state of the art in machine transliteration in these language pairs. We measure and report 6 performance metrics on the submitted results. We believe that the shared task has successfully achieved the following objectives: (i) bringing together the community of researchers in the area of Machine Transliteration to focus on various research avenues, (ii) Calibrating systems on common corpora, using common metrics, thus creating a reasonable baseline for the state-of-the-art of transliteration systems, and (iii) providing a quantitative basis for meaningful comparison and analysis between various algorithmic approaches used in machine transliteration. We believe that the results of this shared task would uncover a host of interesting research problems, giving impetus to research in this significant research area.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131964729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009 使用上下文信息PB-SMT的英语-印地语音译:NEWS 2009的DCU系统
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699732
Rejwanul Haque, Sandipan Dandapat, Ankit K. Srivastava, S. Naskar, Andy Way
{"title":"English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009","authors":"Rejwanul Haque, Sandipan Dandapat, Ankit K. Srivastava, S. Naskar, Andy Way","doi":"10.3115/1699705.1699732","DOIUrl":"https://doi.org/10.3115/1699705.1699732","url":null,"abstract":"This paper presents English---Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framework that enables efficient estimation of these features while avoiding data sparseness problems.We carried out experiments both at character and transliteration unit (TU) level. Position-dependent source context features produce significant improvements in terms of all evaluation metrics.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129256515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Tag Confidence Measure for Semi-Automatically Updating Named Entity Recognition 半自动更新命名实体识别的标签置信度度量
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699745
Kuniko Saito, Kenji Imamura
{"title":"Tag Confidence Measure for Semi-Automatically Updating Named Entity Recognition","authors":"Kuniko Saito, Kenji Imamura","doi":"10.3115/1699705.1699745","DOIUrl":"https://doi.org/10.3115/1699705.1699745","url":null,"abstract":"We present two techniques to reduce machine learning cost, i.e., cost of manually annotating unlabeled data, for adapting existing CRF-based named entity recognition (NER) systems to new texts or domains. We introduce the tag posterior probability as the tag confidence measure of an individual NE tag determined by the base model. Dubious tags are automatically detected as recognition errors, and regarded as targets of manual correction. Compared to entire sentence posterior probability, tag posterior probability has the advantage of minimizing system cost by focusing on those parts of the sentence that require manual correction. Using the tag confidence measure, the first technique, known as active learning, asks the editor to assign correct NE tags only to those parts that the base model could not assign tags confidently. Active learning reduces the learning cost by 66%, compared to the conventional method. As the second technique, we propose bootstrapping NER, which semi-automatically corrects dubious tags and updates its model.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"362 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121647863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
NEWS 2009 Machine Transliteration Shared Task System Description: Transliteration with Letter-to-Phoneme Technology NEWS 2009机器音译共享任务系统描述:字母到音素技术的音译
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699723
Colin Cherry, Hisami Suzuki
{"title":"NEWS 2009 Machine Transliteration Shared Task System Description: Transliteration with Letter-to-Phoneme Technology","authors":"Colin Cherry, Hisami Suzuki","doi":"10.3115/1699705.1699723","DOIUrl":"https://doi.org/10.3115/1699705.1699723","url":null,"abstract":"We interpret the problem of transliterating English named entities into Hindi or Japanese Katakana as a variant of the letter-to-phoneme (L2P) subtask of text-to-speech processing. Therefore, we apply a re-implementation of a state-of-the-art, discriminative L2P system (Jiampojamarn et al., 2008) to the problem, without further modification. In doing so, we hope to provide a baseline for the NEWS 2009 Machine Transliteration Shared Task (Li et al., 2009), indicating how much can be achieved without transliteration-specific technology. This paper briefly summarizes the original work and our reimplementation. We also describe a bug in our submitted implementation, and provide updated results on the development and test sets.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115157662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Testing and Performance Evaluation of Machine Transliteration System for Tamil Language 泰米尔语机器音译系统的测试与性能评价
NEWS@IJCNLP Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699718
Kommaluri Vijayanand
{"title":"Testing and Performance Evaluation of Machine Transliteration System for Tamil Language","authors":"Kommaluri Vijayanand","doi":"10.3115/1699705.1699718","DOIUrl":"https://doi.org/10.3115/1699705.1699718","url":null,"abstract":"Machine Translation (MT) is a science fiction that was converted into reality with the enormous contributions from the MT research community. We cannot expect any text without Named Entities (NE). Such NEs are crucial in deciding the quality of MT. NEs are to be recognized from the text and transliterated accordingly into the target language in order to ensure the quality of MT. In the present paper we present various technical issues encountered during handling the shared task of NE transliteration for Tamil.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116767683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信