A Visualization Method for Training Data Comparison

Karen Kosaka, T. Itoh
{"title":"A Visualization Method for Training Data Comparison","authors":"Karen Kosaka, T. Itoh","doi":"10.1109/IV53921.2021.00040","DOIUrl":null,"url":null,"abstract":"With the diversification of machine learning applications, the quality verification and comparison of training data has been an important process. For example, while performing transfer learning, verification the difference in the quality between the source and the target data can prevent the accuracy of the model from deteriorating. However, training datasets for deep learning is getting larger and larger, and analysis of such datasets is not always easy. As a solution to this problem, we are working on the visualization for training data validation. In this study, we apply dimensionality reduction to the training datasets and display them as scatterplots to realize a visual analysis that can easily detect differences in the quality. Our current implementation draws the regions where the points are concentrated as semitransparent polygons for each label in the scatterplot. Also, the implementation provides a slider to set a threshold for the interactive adjustment of polygon generation. This allows us to observe the differences in the distribution of labels among the training data.","PeriodicalId":380260,"journal":{"name":"2021 25th International Conference Information Visualisation (IV)","volume":"36 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 25th International Conference Information Visualisation (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IV53921.2021.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the diversification of machine learning applications, the quality verification and comparison of training data has been an important process. For example, while performing transfer learning, verification the difference in the quality between the source and the target data can prevent the accuracy of the model from deteriorating. However, training datasets for deep learning is getting larger and larger, and analysis of such datasets is not always easy. As a solution to this problem, we are working on the visualization for training data validation. In this study, we apply dimensionality reduction to the training datasets and display them as scatterplots to realize a visual analysis that can easily detect differences in the quality. Our current implementation draws the regions where the points are concentrated as semitransparent polygons for each label in the scatterplot. Also, the implementation provides a slider to set a threshold for the interactive adjustment of polygon generation. This allows us to observe the differences in the distribution of labels among the training data.
训练数据比较的可视化方法
随着机器学习应用的多样化,训练数据的质量验证和比较已经成为一个重要的过程。例如,在进行迁移学习时,验证源数据和目标数据之间的质量差异可以防止模型的准确性下降。然而,深度学习的训练数据集越来越大,分析这些数据集并不总是那么容易。为了解决这个问题,我们正在研究训练数据验证的可视化。在本研究中,我们对训练数据集进行降维,并将其显示为散点图,以实现可以轻松检测质量差异的可视化分析。我们当前的实现将点集中的区域绘制为散点图中每个标签的半透明多边形。此外,该实现还提供了一个滑块来设置多边形生成的交互式调整阈值。这使我们能够观察到训练数据中标签分布的差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信