{"title":"Using binary classification to evaluate the quality of machine translators","authors":"Ran Li, Yihao Yang, Kelin Shen, Mohammed Hijji","doi":"10.48129/kjs.splml.19547","DOIUrl":null,"url":null,"abstract":"Machine translators have become increasingly popular and currently play an important role because of their great assistance in cross-cultural communication. However, machine translators often produces some unnatural texts, and an evaluation of machine translators is thus needed to avoid the abuse of machine-translated texts. This paper presents the use of binary classification to evaluate the quality of machine translators without references. First, we construct a large-scale dataset including humangenerated texts and machine-translated texts. Second, the dataset is used to train the multiple binary classifiers, e.g., decision tree, random forest, extreme gradient boosting, support vector machines, logistic regression, etc. Finally, these trained classifiers constitute the ensemble model by majority voting, and this ensemble model is used to evaluate the qualities of machine-translated texts. Experimental results show that the proposed evaluation method better measures the qualities of translated texts by some commercial machine translators.","PeriodicalId":49933,"journal":{"name":"Kuwait Journal of Science & Engineering","volume":"95 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Kuwait Journal of Science & Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48129/kjs.splml.19547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine translators have become increasingly popular and currently play an important role because of their great assistance in cross-cultural communication. However, machine translators often produces some unnatural texts, and an evaluation of machine translators is thus needed to avoid the abuse of machine-translated texts. This paper presents the use of binary classification to evaluate the quality of machine translators without references. First, we construct a large-scale dataset including humangenerated texts and machine-translated texts. Second, the dataset is used to train the multiple binary classifiers, e.g., decision tree, random forest, extreme gradient boosting, support vector machines, logistic regression, etc. Finally, these trained classifiers constitute the ensemble model by majority voting, and this ensemble model is used to evaluate the qualities of machine-translated texts. Experimental results show that the proposed evaluation method better measures the qualities of translated texts by some commercial machine translators.