Sebeom Park, S. Lee, Youngtaek Kim, Hyeon Jeon, Seokweon Jung, Jinwook Bok, Jinwook Seo
{"title":"VANT: A Visual Analytics System for Refining Parallel Corpora in Neural Machine Translation","authors":"Sebeom Park, S. Lee, Youngtaek Kim, Hyeon Jeon, Seokweon Jung, Jinwook Bok, Jinwook Seo","doi":"10.1109/PacificVis53943.2022.00029","DOIUrl":null,"url":null,"abstract":"The quality of parallel corpora used to train a Neural Machine Translation (NMT) model can critically influence the model's performance. Various approaches for refining parallel corpora have been introduced, but there is still much room for improvements, such as enhancing the efficiency and the quality of refinement. We introduce VANT, a novel visual analytics system for refining parallel corpora used in training an NMT model. Our system helps users to readily detect and filter noisy parallel corpora by (1) aiding the quality estimation of individual sentence pairs within the corpora by providing diverse quality metrics (e.g., cosine similarity, BLEU, length ratio) and (2) allowing users to visually examine and manage the corpora based on the pre-computed metrics scores. Our system's effectiveness and usefulness are demonstrated through a qualitative user study with eight participants, including four domain experts with real-world datasets.","PeriodicalId":117284,"journal":{"name":"2022 IEEE 15th Pacific Visualization Symposium (PacificVis)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th Pacific Visualization Symposium (PacificVis)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PacificVis53943.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The quality of parallel corpora used to train a Neural Machine Translation (NMT) model can critically influence the model's performance. Various approaches for refining parallel corpora have been introduced, but there is still much room for improvements, such as enhancing the efficiency and the quality of refinement. We introduce VANT, a novel visual analytics system for refining parallel corpora used in training an NMT model. Our system helps users to readily detect and filter noisy parallel corpora by (1) aiding the quality estimation of individual sentence pairs within the corpora by providing diverse quality metrics (e.g., cosine similarity, BLEU, length ratio) and (2) allowing users to visually examine and manage the corpora based on the pre-computed metrics scores. Our system's effectiveness and usefulness are demonstrated through a qualitative user study with eight participants, including four domain experts with real-world datasets.