Disentangled Representation of Data Distributions in Scatterplots

Jaemin Jo, Jinwook Seo
{"title":"Disentangled Representation of Data Distributions in Scatterplots","authors":"Jaemin Jo, Jinwook Seo","doi":"10.1109/VISUAL.2019.8933670","DOIUrl":null,"url":null,"abstract":"We present a data-driven approach to obtain a disentangled and interpretable representation that can characterize bivariate data distributions of scatterplots. We first collect tabular datasets from the Web and build a training corpus consisting of over one million scatterplot images. Then, we train a state-of-the-art disentangling model, β-variational autoencoder, to derive a disentangled representation of the scatterplot images. The main output of this work is a list of 32 representative features that can capture the underlying structures of bivariate data distributions. Through latent traversals, we seek for high-level semantics of the features and compare them to previous human-derived concepts such as scagnostics measures. Finally, using the 32 features as an input, we build a simple neural network to predict the perceptual distances between scatterplots that were previously scored by human annotators. We found Pearson’s correlation coefficient between the predicted and perceptual distances was above 0.75, which indicates the effectiveness of our representation in the quantitative characterization of scatterplots.","PeriodicalId":192801,"journal":{"name":"2019 IEEE Visualization Conference (VIS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Visualization Conference (VIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VISUAL.2019.8933670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

We present a data-driven approach to obtain a disentangled and interpretable representation that can characterize bivariate data distributions of scatterplots. We first collect tabular datasets from the Web and build a training corpus consisting of over one million scatterplot images. Then, we train a state-of-the-art disentangling model, β-variational autoencoder, to derive a disentangled representation of the scatterplot images. The main output of this work is a list of 32 representative features that can capture the underlying structures of bivariate data distributions. Through latent traversals, we seek for high-level semantics of the features and compare them to previous human-derived concepts such as scagnostics measures. Finally, using the 32 features as an input, we build a simple neural network to predict the perceptual distances between scatterplots that were previously scored by human annotators. We found Pearson’s correlation coefficient between the predicted and perceptual distances was above 0.75, which indicates the effectiveness of our representation in the quantitative characterization of scatterplots.
散点图中数据分布的解纠缠表示
我们提出了一种数据驱动的方法,以获得可以表征散点图的二元数据分布的解纠缠和可解释的表示。我们首先从网络上收集表格数据集,并建立一个由超过一百万张散点图图像组成的训练语料库。然后,我们训练了一个最先进的解纠缠模型,β-变分自编码器,以导出散点图图像的解纠缠表示。这项工作的主要输出是一个包含32个代表性特征的列表,这些特征可以捕获二元数据分布的底层结构。通过潜在遍历,我们寻求特征的高级语义,并将它们与以前的人类衍生概念(如scagnostics度量)进行比较。最后,使用32个特征作为输入,我们构建了一个简单的神经网络来预测之前由人类注释者评分的散点图之间的感知距离。我们发现预测距离和感知距离之间的Pearson相关系数大于0.75,这表明我们在散点图定量表征方面的表示是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信