What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration

Yue (Sophie) Guo, Carsten Binnig, Tim Kraska
{"title":"What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration","authors":"Yue (Sophie) Guo, Carsten Binnig, Tim Kraska","doi":"10.1145/3077257.3077266","DOIUrl":null,"url":null,"abstract":"Visual data exploration tools, such as Vizdom or Tableau, significantly simplify data exploration for domain experts and, more importantly, novice users. These tools allow to discover complex correlations and to test hypotheses and differences between various populations in an entirely visual manner with just a few clicks, unfortunately, often ignoring even the most basic statistical rules. For example, there are many statistical pitfalls that a user can \"tap\" into when exploring data sets. As a result of this experience, we started to build QUDE [1], the first system to Quantifying the Uncertainty in Data Exploration, which is part of Brown's Interactive Data Exploration Stack (called IDES). The goal of QUDE is to automatically warn and, if possible, protect users from common mistakes during the data exploration process. In this paper, we focus on a different type of error, the Simpson's Paradox, which is a special type of error in which a high-level aggregate/visualization leads to the wrong conclusion since a trend reverts when splitting the visualized data set into multiple subgroups (i.e., when executing a drill-down)..","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3077257.3077266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Visual data exploration tools, such as Vizdom or Tableau, significantly simplify data exploration for domain experts and, more importantly, novice users. These tools allow to discover complex correlations and to test hypotheses and differences between various populations in an entirely visual manner with just a few clicks, unfortunately, often ignoring even the most basic statistical rules. For example, there are many statistical pitfalls that a user can "tap" into when exploring data sets. As a result of this experience, we started to build QUDE [1], the first system to Quantifying the Uncertainty in Data Exploration, which is part of Brown's Interactive Data Exploration Stack (called IDES). The goal of QUDE is to automatically warn and, if possible, protect users from common mistakes during the data exploration process. In this paper, we focus on a different type of error, the Simpson's Paradox, which is a special type of error in which a high-level aggregate/visualization leads to the wrong conclusion since a trend reverts when splitting the visualized data set into multiple subgroups (i.e., when executing a drill-down)..
你看到的不是你得到的!在数据探索中发现辛普森悖论
可视化数据探索工具,如Vizdom或Tableau,大大简化了领域专家的数据探索,更重要的是,新手用户。这些工具可以发现复杂的相关性,并以完全可视化的方式测试不同人群之间的假设和差异,只需点击几下鼠标,不幸的是,这些工具往往忽略了最基本的统计规则。例如,在探索数据集时,用户可以“挖掘”到许多统计陷阱。由于这一经验,我们开始构建QUDE[1],这是第一个量化数据探索中的不确定性的系统,它是布朗交互式数据探索堆栈(称为IDES)的一部分。QUDE的目标是在数据探索过程中自动发出警告,并在可能的情况下保护用户避免出现常见错误。在本文中,我们关注的是一种不同类型的错误,辛普森悖论,这是一种特殊类型的错误,在这种错误中,高级聚合/可视化导致错误的结论,因为当将可视化数据集分成多个子组时(即,在执行深入操作时)趋势会逆转。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信