What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI:10.1145/3077257.3077266

Yue (Sophie) Guo, Carsten Binnig, Tim Kraska

{"title":"What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration","authors":"Yue (Sophie) Guo, Carsten Binnig, Tim Kraska","doi":"10.1145/3077257.3077266","DOIUrl":null,"url":null,"abstract":"Visual data exploration tools, such as Vizdom or Tableau, significantly simplify data exploration for domain experts and, more importantly, novice users. These tools allow to discover complex correlations and to test hypotheses and differences between various populations in an entirely visual manner with just a few clicks, unfortunately, often ignoring even the most basic statistical rules. For example, there are many statistical pitfalls that a user can \"tap\" into when exploring data sets. As a result of this experience, we started to build QUDE [1], the first system to Quantifying the Uncertainty in Data Exploration, which is part of Brown's Interactive Data Exploration Stack (called IDES). The goal of QUDE is to automatically warn and, if possible, protect users from common mistakes during the data exploration process. In this paper, we focus on a different type of error, the Simpson's Paradox, which is a special type of error in which a high-level aggregate/visualization leads to the wrong conclusion since a trend reverts when splitting the visualized data set into multiple subgroups (i.e., when executing a drill-down)..","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3077257.3077266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Visual data exploration tools, such as Vizdom or Tableau, significantly simplify data exploration for domain experts and, more importantly, novice users. These tools allow to discover complex correlations and to test hypotheses and differences between various populations in an entirely visual manner with just a few clicks, unfortunately, often ignoring even the most basic statistical rules. For example, there are many statistical pitfalls that a user can "tap" into when exploring data sets. As a result of this experience, we started to build QUDE [1], the first system to Quantifying the Uncertainty in Data Exploration, which is part of Brown's Interactive Data Exploration Stack (called IDES). The goal of QUDE is to automatically warn and, if possible, protect users from common mistakes during the data exploration process. In this paper, we focus on a different type of error, the Simpson's Paradox, which is a special type of error in which a high-level aggregate/visualization leads to the wrong conclusion since a trend reverts when splitting the visualized data set into multiple subgroups (i.e., when executing a drill-down)..

查看原文本刊更多论文

你看到的不是你得到的!在数据探索中发现辛普森悖论

可视化数据探索工具，如Vizdom或Tableau，大大简化了领域专家的数据探索，更重要的是，新手用户。这些工具可以发现复杂的相关性，并以完全可视化的方式测试不同人群之间的假设和差异，只需点击几下鼠标，不幸的是，这些工具往往忽略了最基本的统计规则。例如，在探索数据集时，用户可以“挖掘”到许多统计陷阱。由于这一经验，我们开始构建QUDE[1]，这是第一个量化数据探索中的不确定性的系统，它是布朗交互式数据探索堆栈(称为IDES)的一部分。QUDE的目标是在数据探索过程中自动发出警告，并在可能的情况下保护用户避免出现常见错误。在本文中，我们关注的是一种不同类型的错误，辛普森悖论，这是一种特殊类型的错误，在这种错误中，高级聚合/可视化导致错误的结论，因为当将可视化数据集分成多个子组时(即，在执行深入操作时)趋势会逆转。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)

自引率

0.00%

发文量