{"title":"Improving Chinese to English SMT with Multiple CWS Results","authors":"Ryan Ma, T. Zhao","doi":"10.1109/IALP.2009.36","DOIUrl":null,"url":null,"abstract":"In Chinese to English statistical machine translation (SMT), Chinese texts always need a pre-processing high segments sentences into words and this standard approach is Chinese word segmentation (CWS). However, CWS is not developed for SMT, its results are not necessarily optimal for SMT. In recent years, many investigations have been performed concerning making CWS suitable for SMT, but we explore it from another direction. In this paper, our basic idea is to use multiple CWS results as additional language knowledge sources and we present a simple and effective approach to use multiple CWS results for SMT. We also give experiment results over range of strategy settings, and obtain substantial improvements in performance for translation from Chinese to English. The best result shows we gain 1.89 BLEU percentage points over a state of the art HPBT baseline system without using multiple CWS results.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"56 68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2009.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In Chinese to English statistical machine translation (SMT), Chinese texts always need a pre-processing high segments sentences into words and this standard approach is Chinese word segmentation (CWS). However, CWS is not developed for SMT, its results are not necessarily optimal for SMT. In recent years, many investigations have been performed concerning making CWS suitable for SMT, but we explore it from another direction. In this paper, our basic idea is to use multiple CWS results as additional language knowledge sources and we present a simple and effective approach to use multiple CWS results for SMT. We also give experiment results over range of strategy settings, and obtain substantial improvements in performance for translation from Chinese to English. The best result shows we gain 1.89 BLEU percentage points over a state of the art HPBT baseline system without using multiple CWS results.