{"title":"NMT Sentence Granularity Similarity Calculation Method Based on Improved Cosine Distance","authors":"Shuyan Wang, Jingjing Ma","doi":"10.1145/3573942.3574021","DOIUrl":null,"url":null,"abstract":"Aiming at the problem of semantic lack of sentence similarity calculation in the process of metamorphosis test of neural machine translation system, an NMT sentence granularity similarity calculation method based on improved Cosine Distance is proposed. Text vectors are constructed through the improved TF-IDF weights, and the combination of Edit Distance and Jaccard similarity coefficient is used as a suppressor for cosine similarity. Experiments on neural machine translation systems such as Alibaba Translation and Baidu Translation on the UM-Corpus dataset show that, compared with the method based on Edit Distance, this method improves the Pearson correlation coefficient and Spearman correlation coefficient of the reference translation method by 20.5% and 12%, respectively. And this method is closer to the BLEU and METEOR evaluation results based on the reference translation, the evaluation accuracy is higher.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3573942.3574021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aiming at the problem of semantic lack of sentence similarity calculation in the process of metamorphosis test of neural machine translation system, an NMT sentence granularity similarity calculation method based on improved Cosine Distance is proposed. Text vectors are constructed through the improved TF-IDF weights, and the combination of Edit Distance and Jaccard similarity coefficient is used as a suppressor for cosine similarity. Experiments on neural machine translation systems such as Alibaba Translation and Baidu Translation on the UM-Corpus dataset show that, compared with the method based on Edit Distance, this method improves the Pearson correlation coefficient and Spearman correlation coefficient of the reference translation method by 20.5% and 12%, respectively. And this method is closer to the BLEU and METEOR evaluation results based on the reference translation, the evaluation accuracy is higher.