Unlabeled Dependency Parsing Based Pre-reordering for Chinese-to-Japanese SMT

Q4 Computer Science
D. Han, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, M. Nagata
{"title":"Unlabeled Dependency Parsing Based Pre-reordering for Chinese-to-Japanese SMT","authors":"D. Han, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, M. Nagata","doi":"10.11185/IMT.9.272","DOIUrl":null,"url":null,"abstract":"In statistical machine translation, Chinese and Japanese is a well-known long-distance language pair that causes difficulties to word alignment techniques. Pre-reordering methods have been proven efficient and effective; however, they need reliable parsers to extract the syntactic structure of the source sentences. On one hand, we propose a framework in which only part-of-speech (POS) tags and unlabeled dependency parse trees are used to minimize the influence of parse errors, and linguistic knowledge on structural difference is encoded in the form of reordering rules. We show significant improvements in translation quality of sentences in the news domain over state-of-the-art reordering methods. On the other hand, we explore the relationship between dependency parsing and our pre-reordering method from two aspects: POS tags and dependencies. We observe the effects of different parse errors on reordering performance by combining empirical and descriptive approaches. In the empirical approach, we quantify the distribution of general parse errors along with reordering quality. In the descriptive approach, we extract seven influential error patterns and examine their correlations with reordering errors.","PeriodicalId":16243,"journal":{"name":"Journal of Information Processing","volume":"9 1","pages":"272-301"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11185/IMT.9.272","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 2

Abstract

In statistical machine translation, Chinese and Japanese is a well-known long-distance language pair that causes difficulties to word alignment techniques. Pre-reordering methods have been proven efficient and effective; however, they need reliable parsers to extract the syntactic structure of the source sentences. On one hand, we propose a framework in which only part-of-speech (POS) tags and unlabeled dependency parse trees are used to minimize the influence of parse errors, and linguistic knowledge on structural difference is encoded in the form of reordering rules. We show significant improvements in translation quality of sentences in the news domain over state-of-the-art reordering methods. On the other hand, we explore the relationship between dependency parsing and our pre-reordering method from two aspects: POS tags and dependencies. We observe the effects of different parse errors on reordering performance by combining empirical and descriptive approaches. In the empirical approach, we quantify the distribution of general parse errors along with reordering quality. In the descriptive approach, we extract seven influential error patterns and examine their correlations with reordering errors.
基于无标签依赖解析的中文-日文SMT预排序
在统计机器翻译中,汉语和日语是众所周知的长距离语言对,这给词对齐技术带来了困难。预先重新排序方法已被证明是高效的;但是,它们需要可靠的解析器来提取源句子的句法结构。一方面,我们提出了一个仅使用词性标记和未标记依赖解析树的框架,以最大限度地减少解析错误的影响,并以重排序规则的形式对语言结构差异的知识进行编码。我们展示了在新闻领域中句子的翻译质量比最先进的重新排序方法有显著的提高。另一方面,我们从POS标签和依赖关系两个方面探讨了依赖解析与我们的预重排序方法之间的关系。我们通过结合经验和描述方法观察不同解析错误对重排序性能的影响。在经验方法中,我们量化了一般解析错误的分布以及重排序质量。在描述方法中,我们提取了七个有影响的错误模式,并检查了它们与重排序错误的相关性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Information Processing
Journal of Information Processing Computer Science-Computer Science (all)
CiteScore
1.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信