从音译/翻译对中管理一个音译语料库

2008 IEEE International Conference on Information Reuse and Integration Pub Date : 2008-07-13 DOI:10.1109/IRI.2008.4583031

Shih-Hung Wu, Yu-Te Li

{"title":"从音译/翻译对中管理一个音译语料库","authors":"Shih-Hung Wu, Yu-Te Li","doi":"10.1109/IRI.2008.4583031","DOIUrl":null,"url":null,"abstract":"Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since different information sources have different standards for the transliteration. To build a statistic machine transliteration module, researchers have to curate a transliteration corpus for any given two languages of interest. Since a large amount of transliteration/translation pairs can be collected from the Web, a large transliteration-training corpus can be curated from these pairs. In this paper, we proposed a bi-directional approach to classify transliteration/translation pairs. Our approach combines both forward transliteration and backward transliteration to classify transliteration from translation. An experiment on English and Chinese transliteration is conducted.","PeriodicalId":169554,"journal":{"name":"2008 IEEE International Conference on Information Reuse and Integration","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Curate a transliteration corpus from transliteration/translation pairs\",\"authors\":\"Shih-Hung Wu, Yu-Te Li\",\"doi\":\"10.1109/IRI.2008.4583031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since different information sources have different standards for the transliteration. To build a statistic machine transliteration module, researchers have to curate a transliteration corpus for any given two languages of interest. Since a large amount of transliteration/translation pairs can be collected from the Web, a large transliteration-training corpus can be curated from these pairs. In this paper, we proposed a bi-directional approach to classify transliteration/translation pairs. Our approach combines both forward transliteration and backward transliteration to classify transliteration from translation. An experiment on English and Chinese transliteration is conducted.\",\"PeriodicalId\":169554,\"journal\":{\"name\":\"2008 IEEE International Conference on Information Reuse and Integration\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Information Reuse and Integration\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2008.4583031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Information Reuse and Integration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2008.4583031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

新命名实体的音译对于跨两种或多种语言的信息检索非常重要。基于规则的机器音译并不令人满意，因为不同的信息源有不同的音译标准。为了建立一个统计机器音译模块，研究人员必须为任何给定的两种感兴趣的语言设计一个音译语料库。由于可以从Web上收集到大量的音译/翻译对，因此可以从这些对中整理出一个大型的音译训练语料库。本文提出了一种双向的音译/翻译对分类方法。我们的方法结合了前向音译和后向音译，从翻译中分类音译。进行了英汉音译的实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Curate a transliteration corpus from transliteration/translation pairs

Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since different information sources have different standards for the transliteration. To build a statistic machine transliteration module, researchers have to curate a transliteration corpus for any given two languages of interest. Since a large amount of transliteration/translation pairs can be collected from the Web, a large transliteration-training corpus can be curated from these pairs. In this paper, we proposed a bi-directional approach to classify transliteration/translation pairs. Our approach combines both forward transliteration and backward transliteration to classify transliteration from translation. An experiment on English and Chinese transliteration is conducted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Information Reuse and Integration

自引率

0.00%

发文量