Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzmán, Mona T. Diab, Philipp Koehn
{"title":"跨语言对机器翻译的一致性人类评价","authors":"Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzmán, Mona T. Diab, Philipp Koehn","doi":"10.48550/arXiv.2205.08533","DOIUrl":null,"url":null,"abstract":"Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment. We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs, with translation both into and out of English.","PeriodicalId":201231,"journal":{"name":"Conference of the Association for Machine Translation in the Americas","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Consistent Human Evaluation of Machine Translation across Language Pairs\",\"authors\":\"Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzmán, Mona T. Diab, Philipp Koehn\",\"doi\":\"10.48550/arXiv.2205.08533\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment. We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs, with translation both into and out of English.\",\"PeriodicalId\":201231,\"journal\":{\"name\":\"Conference of the Association for Machine Translation in the Americas\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference of the Association for Machine Translation in the Americas\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.08533\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference of the Association for Machine Translation in the Americas","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.08533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Consistent Human Evaluation of Machine Translation across Language Pairs
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs. We propose a new metric called XSTS that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment. We demonstrate the effectiveness of these novel contributions in large scale evaluation studies across up to 14 language pairs, with translation both into and out of English.