Unlabeled Dependency Parsing Based Pre-reordering for Chinese-to-Japanese SMT

Q4 Computer Science

Journal of Information Processing Pub Date : 2014-06-16 DOI:10.11185/IMT.9.272

D. Han, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, M. Nagata

{"title":"Unlabeled Dependency Parsing Based Pre-reordering for Chinese-to-Japanese SMT","authors":"D. Han, Pascual Martínez-Gómez, Yusuke Miyao, Katsuhito Sudoh, M. Nagata","doi":"10.11185/IMT.9.272","DOIUrl":null,"url":null,"abstract":"In statistical machine translation, Chinese and Japanese is a well-known long-distance language pair that causes diﬃculties to word alignment techniques. Pre-reordering methods have been proven eﬃcient and eﬀective; however, they need reliable parsers to extract the syntactic structure of the source sentences. On one hand, we propose a framework in which only part-of-speech (POS) tags and unlabeled dependency parse trees are used to minimize the inﬂuence of parse errors, and linguistic knowledge on structural diﬀerence is encoded in the form of reordering rules. We show signiﬁcant improvements in translation quality of sentences in the news domain over state-of-the-art reordering methods. On the other hand, we explore the relationship between dependency parsing and our pre-reordering method from two aspects: POS tags and dependencies. We observe the eﬀects of diﬀerent parse errors on reordering performance by combining empirical and descriptive approaches. In the empirical approach, we quantify the distribution of general parse errors along with reordering quality. In the descriptive approach, we extract seven inﬂuential error patterns and examine their correlations with reordering errors.","PeriodicalId":16243,"journal":{"name":"Journal of Information Processing","volume":"9 1","pages":"272-301"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11185/IMT.9.272","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 2

Abstract

In statistical machine translation, Chinese and Japanese is a well-known long-distance language pair that causes diﬃculties to word alignment techniques. Pre-reordering methods have been proven eﬃcient and eﬀective; however, they need reliable parsers to extract the syntactic structure of the source sentences. On one hand, we propose a framework in which only part-of-speech (POS) tags and unlabeled dependency parse trees are used to minimize the inﬂuence of parse errors, and linguistic knowledge on structural diﬀerence is encoded in the form of reordering rules. We show signiﬁcant improvements in translation quality of sentences in the news domain over state-of-the-art reordering methods. On the other hand, we explore the relationship between dependency parsing and our pre-reordering method from two aspects: POS tags and dependencies. We observe the eﬀects of diﬀerent parse errors on reordering performance by combining empirical and descriptive approaches. In the empirical approach, we quantify the distribution of general parse errors along with reordering quality. In the descriptive approach, we extract seven inﬂuential error patterns and examine their correlations with reordering errors.

查看原文本刊更多论文

基于无标签依赖解析的中文-日文SMT预排序

在统计机器翻译中，汉语和日语是众所周知的长距离语言对，这给词对齐技术带来了困难。预先重新排序方法已被证明是高效的;但是，它们需要可靠的解析器来提取源句子的句法结构。一方面，我们提出了一个仅使用词性标记和未标记依赖解析树的框架，以最大限度地减少解析错误的影响，并以重排序规则的形式对语言结构差异的知识进行编码。我们展示了在新闻领域中句子的翻译质量比最先进的重新排序方法有显著的提高。另一方面，我们从POS标签和依赖关系两个方面探讨了依赖解析与我们的预重排序方法之间的关系。我们通过结合经验和描述方法观察不同解析错误对重排序性能的影响。在经验方法中，我们量化了一般解析错误的分布以及重排序质量。在描述方法中，我们提取了七个有影响的错误模式，并检查了它们与重排序错误的相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Processing Computer Science-Computer Science (all)

CiteScore

1.20

自引率

0.00%

发文量