从音译/翻译对中管理一个音译语料库

Shih-Hung Wu, Yu-Te Li
{"title":"从音译/翻译对中管理一个音译语料库","authors":"Shih-Hung Wu, Yu-Te Li","doi":"10.1109/IRI.2008.4583031","DOIUrl":null,"url":null,"abstract":"Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since different information sources have different standards for the transliteration. To build a statistic machine transliteration module, researchers have to curate a transliteration corpus for any given two languages of interest. Since a large amount of transliteration/translation pairs can be collected from the Web, a large transliteration-training corpus can be curated from these pairs. In this paper, we proposed a bi-directional approach to classify transliteration/translation pairs. Our approach combines both forward transliteration and backward transliteration to classify transliteration from translation. An experiment on English and Chinese transliteration is conducted.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Curate a transliteration corpus from transliteration/translation pairs\",\"authors\":\"Shih-Hung Wu, Yu-Te Li\",\"doi\":\"10.1109/IRI.2008.4583031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since different information sources have different standards for the transliteration. To build a statistic machine transliteration module, researchers have to curate a transliteration corpus for any given two languages of interest. Since a large amount of transliteration/translation pairs can be collected from the Web, a large transliteration-training corpus can be curated from these pairs. In this paper, we proposed a bi-directional approach to classify transliteration/translation pairs. Our approach combines both forward transliteration and backward transliteration to classify transliteration from translation. An experiment on English and Chinese transliteration is conducted.\",\"PeriodicalId\":169554,\"journal\":{\"name\":\"2008 IEEE International Conference on Information Reuse and Integration\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Information Reuse and Integration\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2008.4583031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Information Reuse and Integration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2008.4583031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

新命名实体的音译对于跨两种或多种语言的信息检索非常重要。基于规则的机器音译并不令人满意,因为不同的信息源有不同的音译标准。为了建立一个统计机器音译模块,研究人员必须为任何给定的两种感兴趣的语言设计一个音译语料库。由于可以从Web上收集到大量的音译/翻译对,因此可以从这些对中整理出一个大型的音译训练语料库。本文提出了一种双向的音译/翻译对分类方法。我们的方法结合了前向音译和后向音译,从翻译中分类音译。进行了英汉音译的实验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Curate a transliteration corpus from transliteration/translation pairs
Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since different information sources have different standards for the transliteration. To build a statistic machine transliteration module, researchers have to curate a transliteration corpus for any given two languages of interest. Since a large amount of transliteration/translation pairs can be collected from the Web, a large transliteration-training corpus can be curated from these pairs. In this paper, we proposed a bi-directional approach to classify transliteration/translation pairs. Our approach combines both forward transliteration and backward transliteration to classify transliteration from translation. An experiment on English and Chinese transliteration is conducted.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信