{"title":"A Visualization Method for Training Data Comparison","authors":"Karen Kosaka, T. Itoh","doi":"10.1109/IV53921.2021.00040","DOIUrl":null,"url":null,"abstract":"With the diversification of machine learning applications, the quality verification and comparison of training data has been an important process. For example, while performing transfer learning, verification the difference in the quality between the source and the target data can prevent the accuracy of the model from deteriorating. However, training datasets for deep learning is getting larger and larger, and analysis of such datasets is not always easy. As a solution to this problem, we are working on the visualization for training data validation. In this study, we apply dimensionality reduction to the training datasets and display them as scatterplots to realize a visual analysis that can easily detect differences in the quality. Our current implementation draws the regions where the points are concentrated as semitransparent polygons for each label in the scatterplot. Also, the implementation provides a slider to set a threshold for the interactive adjustment of polygon generation. This allows us to observe the differences in the distribution of labels among the training data.","PeriodicalId":380260,"journal":{"name":"2021 25th International Conference Information Visualisation (IV)","volume":"36 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 25th International Conference Information Visualisation (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IV53921.2021.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the diversification of machine learning applications, the quality verification and comparison of training data has been an important process. For example, while performing transfer learning, verification the difference in the quality between the source and the target data can prevent the accuracy of the model from deteriorating. However, training datasets for deep learning is getting larger and larger, and analysis of such datasets is not always easy. As a solution to this problem, we are working on the visualization for training data validation. In this study, we apply dimensionality reduction to the training datasets and display them as scatterplots to realize a visual analysis that can easily detect differences in the quality. Our current implementation draws the regions where the points are concentrated as semitransparent polygons for each label in the scatterplot. Also, the implementation provides a slider to set a threshold for the interactive adjustment of polygon generation. This allows us to observe the differences in the distribution of labels among the training data.