Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)最新文献

筛选
英文 中文
What Users Don't Expect about Exploratory Data Analysis on Approximate Query Processing Systems 关于近似查询处理系统的探索性数据分析,用户不期望的是什么
Dominik Moritz, Danyel Fisher
{"title":"What Users Don't Expect about Exploratory Data Analysis on Approximate Query Processing Systems","authors":"Dominik Moritz, Danyel Fisher","doi":"10.1145/3077257.3077258","DOIUrl":"https://doi.org/10.1145/3077257.3077258","url":null,"abstract":"Pangloss implements \"Optimistic Visualization\", a method that gives analysts confidence to use approximate results for exploratory data analysis. In this paper, we outline how analysts' experience with an approximate visualization system did not match their intuitions. These observations have implications for the design of future data exploration systems that expose uncertainty. We also describe requirements for approximate query engines to enable the next generation of exploratory visualization systems.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86528056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Observation-Level Interaction with Clustering and Dimension Reduction Algorithms 与聚类和降维算法的观测级交互
John E. Wenskovitch, Chris North
{"title":"Observation-Level Interaction with Clustering and Dimension Reduction Algorithms","authors":"John E. Wenskovitch, Chris North","doi":"10.1145/3077257.3077259","DOIUrl":"https://doi.org/10.1145/3077257.3077259","url":null,"abstract":"Observation-Level Interaction (OLI) is a sensemaking technique relying upon the interactive semantic exploration of data. By manipulating data items within a visualization, users provide feedback to an underlying mathematical model that projects multidimensional data into a meaningful two-dimensional representation. In this work, we propose, implement, and evaluate an OLI model which explicitly defines clusters within this data projection. These clusters provide targets against which data values can be manipulated. The result is a cooperative framework in which the layout of the data affects the clusters, while user-driven interactions with the clusters affect the layout of the data points. Additionally, this model addresses the OLI \"with respect to what\" problem by providing a clear set of clusters against which interaction targets are judged and computed.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83125325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Observing the Data Scientist: Using Manual Corrections As Implicit Feedback 观察数据科学家:使用人工修正作为隐式反馈
Nurzety A. Azuan, Suzanne M. Embury, N. Paton
{"title":"Observing the Data Scientist: Using Manual Corrections As Implicit Feedback","authors":"Nurzety A. Azuan, Suzanne M. Embury, N. Paton","doi":"10.1145/3077257.3077272","DOIUrl":"https://doi.org/10.1145/3077257.3077272","url":null,"abstract":"Dataspaces aim to remove the up-front costs of information integration by gathering the needed domain information through targeted interactions with the end-user throughout the life-time of the integration. State-of-the-art tools are used to rapidly construct an initial (incorrect) integration, which is then refined in a pay-as-you-go manner by asking end-users to supply feedback on the resulting data. The idea is that end-users will choose to put effort into providing feedback on the areas of the integration where the quality is important to them, while other less well-used areas will receive a smaller share of user attention. This approach is promising but open problems remain. One issue is that the end-user loses control over the process. Their contribution is to specify their query requirements and to provide feedback on the results, as directed by the dataspace. But what feedback should the user supply to get the data they want? We propose a new approach to data integration in which the end-user and the dataspace work as equal partners to meet the integration goal. Both are able to perform data integration tasks directly, and both request and provide feedback on the results. In addition, the dataspace observes the actions of the end-user when carrying out integration, with the aim of automating that part of the work in future integration tasks. In this paper, we explore this idea by examining how a dataspace can observe an end-user at work, correcting errors in query results, to gather feedback needed to refine the mappings used for integration. We propose an algorithm for converting manual corrections to feedback, and present the results of a preliminary evaluation comparing this approach with seeking explicit feedback from end-users.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76962293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Human-in-the-Loop Challenges for Entity Matching: A Midterm Report 实体匹配的人在循环挑战:中期报告
A. Doan, A. Ardalan, Jeffrey R. Ballard, Sanjib Das, Yash Govind, Pradap Konda, Han Li, Sidharth Mudgal, Erik Paulson, C. PaulSuganthanG., Haojun Zhang
{"title":"Human-in-the-Loop Challenges for Entity Matching: A Midterm Report","authors":"A. Doan, A. Ardalan, Jeffrey R. Ballard, Sanjib Das, Yash Govind, Pradap Konda, Han Li, Sidharth Mudgal, Erik Paulson, C. PaulSuganthanG., Haojun Zhang","doi":"10.1145/3077257.3077268","DOIUrl":"https://doi.org/10.1145/3077257.3077268","url":null,"abstract":"Entity matching (EM) has been a long-standing challenge in data management. In the past few years we have started two major projects on EM (Magellan and Corleone/Falcon). These projects have raised many human-in-the-loop (HIL) challenges. In this paper we discuss these challenges. In particular, we show how these challenges forced us to revise our solution architecture, from a typical RDBMS-style architecture to a very human-centric one, in which human users are first-class objects driving the EM process, using tools at pain-point places. We discuss how such solution architectures can be viewed as combining \"tools in the loop\" with \"human in the loop\". Finally, we discuss lessons learned which can potentially be applied to other problem settings. We also hope that more researchers will investigate EM, as it can be a rich \"playground\" for HIL research.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74980686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
A Game-theoretic Approach to Data Interaction: A Progress Report 数据交互的博弈论方法:进展报告
Ben McCamish, Arash Termehchy, B. Touri
{"title":"A Game-theoretic Approach to Data Interaction: A Progress Report","authors":"Ben McCamish, Arash Termehchy, B. Touri","doi":"10.1145/3077257.3077270","DOIUrl":"https://doi.org/10.1145/3077257.3077270","url":null,"abstract":"As most database users cannot precisely express their information needs in the form of database queries, it is challenging for database query interfaces to understand and satisfy their intents. Database systems usually improve their understanding of users' intents by collecting their feedback on the answers to the users' imprecise and ill-specified queries. Users may also learn to express their queries precisely during their interactions with the database system. In this paper, we report our progress on developing a formal framework for representing and understanding information needs in database querying and exploration. Our framework considers querying as a collaboration between the user and the database system to establish a mutual language for representing information needs. We formalize this collaboration as a signaling game between two potentially rational agents: the user and the database system. We believe that this framework naturally models the long-term interaction of users and database systems.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77786080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration 你看到的不是你得到的!在数据探索中发现辛普森悖论
Yue (Sophie) Guo, Carsten Binnig, Tim Kraska
{"title":"What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration","authors":"Yue (Sophie) Guo, Carsten Binnig, Tim Kraska","doi":"10.1145/3077257.3077266","DOIUrl":"https://doi.org/10.1145/3077257.3077266","url":null,"abstract":"Visual data exploration tools, such as Vizdom or Tableau, significantly simplify data exploration for domain experts and, more importantly, novice users. These tools allow to discover complex correlations and to test hypotheses and differences between various populations in an entirely visual manner with just a few clicks, unfortunately, often ignoring even the most basic statistical rules. For example, there are many statistical pitfalls that a user can \"tap\" into when exploring data sets. As a result of this experience, we started to build QUDE [1], the first system to Quantifying the Uncertainty in Data Exploration, which is part of Brown's Interactive Data Exploration Stack (called IDES). The goal of QUDE is to automatically warn and, if possible, protect users from common mistakes during the data exploration process. In this paper, we focus on a different type of error, the Simpson's Paradox, which is a special type of error in which a high-level aggregate/visualization leads to the wrong conclusion since a trend reverts when splitting the visualized data set into multiple subgroups (i.e., when executing a drill-down)..","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81264473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision 对破碎的机器学习抽象的过度反应:轻松。毫升的愿景
Ce Zhang, Wentao Wu, Tian Li
{"title":"An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision","authors":"Ce Zhang, Wentao Wu, Tian Li","doi":"10.1145/3077257.3077265","DOIUrl":"https://doi.org/10.1145/3077257.3077265","url":null,"abstract":"After hours of teaching astrophysicists TensorFlow and then see them, nevertheless, continue to struggle in the most creative way possible, we asked, What is the point of all of these efforts? It was a warm winter afternoon, Zurich was not gloomy at all; while Seattle was sunny as usual, and Beijing's air was crystally clear. One of the authors stormed out of a Marathon meeting with biologists, and our journey of overreaction begins. We ask, Can we build a system that gets domain experts completely out of the machine learning loop? Can this system have exactly the same interface as linear regression, the bare minimum requirement of a scientist? We started trial-and-errors and discussions with domain experts, all of whom not only have a great sense of humor but also generously offered to be our \"guinea pigs.\" After months of exploration the architecture of our system, ease.ml, starts to get into shape---It is not as general as TensorFlow but not completely useless; in fact, many applications we are supporting can be built completely with ease.ml, and many others just need some syntax sugars. During development, we find that building ease.ml in the right way raises a series of technical challenges. In this paper, we describe our ease.ml vision, discuss each of these technical challenges, and map out our research agenda for the months and years to come.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84268976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Assisting Discovery in Public Health 协助公共卫生发现
Yannis Katsis, N. Koulouris, Y. Papakonstantinou, K. Patrick
{"title":"Assisting Discovery in Public Health","authors":"Yannis Katsis, N. Koulouris, Y. Papakonstantinou, K. Patrick","doi":"10.1145/3077257.3077269","DOIUrl":"https://doi.org/10.1145/3077257.3077269","url":null,"abstract":"Several public health (PH) researchers have lately been arguing that big data can play a profound role in scientific discovery. Leveraging the vast amount of population-level data collected by public agencies and other organizations, could lead to important discoveries that were not necessarily suspected to be true. However, they also warn about the pitfalls of data-driven discovery: The large amount of data can easily lead to information overload for the researchers. Additionally, data-driven studies that make a lot of tests in the search for important discoveries have the potential to lead to discoveries that seem important but are in fact random. We show that data-driven studies can be effective and yet avoid the potential pitfalls by keeping the researchers in the loop of the discovery process. To this end, we propose PHD; an interactive visual discovery system that allows public health researchers to gain interesting insights from large datasets. PHD generalizes the current workflow of PH researchers by facilitating the major analytics tasks involved in PH discovery, such as calculating important associations based on the standard notions of odds rations and confidence intervals, controlling for the effect of other variables and discovering interesting compounding effects. More importantly however, it leverages user interaction and the semantics of the domain to make sure that this workflow scales to large datasets, while avoiding information overload and random discoveries.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83667274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SpeakQL: Towards Speech-driven Multi-modal Querying SpeakQL:迈向语音驱动的多模态查询
Dharmil Chandarana, Vraj Shah, Arun Kumar, L. Saul
{"title":"SpeakQL: Towards Speech-driven Multi-modal Querying","authors":"Dharmil Chandarana, Vraj Shah, Arun Kumar, L. Saul","doi":"10.1145/3077257.3077264","DOIUrl":"https://doi.org/10.1145/3077257.3077264","url":null,"abstract":"Natural language and touch-based interfaces are making data querying significantly easier. But typed SQL remains the gold standard for query sophistication although it is painful in many querying environments. Recent advancements in automatic speech recognition raise the tantalizing possibility of bridging this gap by enabling spoken SQL queries. In this work, we outline our vision of one such new query interface and system for regular SQL that is primarily speech-driven. We propose an end-to-end architecture for making spoken SQL querying effective and efficient and present initial empirical results to understand the feasibility of such an approach. We identify several open research questions and propose alternative solutions that we plan to explore.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"174 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91066234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Flipper: A Systematic Approach to Debugging Training Sets Flipper:调试训练集的系统方法
P. Varma, Dan Iter, Christopher De Sa, C. Ré
{"title":"Flipper: A Systematic Approach to Debugging Training Sets","authors":"P. Varma, Dan Iter, Christopher De Sa, C. Ré","doi":"10.1145/3077257.3077263","DOIUrl":"https://doi.org/10.1145/3077257.3077263","url":null,"abstract":"As machine learning methods gain popularity across different fields, acquiring labeled training datasets has become the primary bottleneck in the machine learning pipeline. Recently generative models have been used to create and label large amounts of training data, albeit noisily. The output of these generative models is then used to train a discriminative model of choice, such as logistic regression or a complex neural network. However, any errors in the generative model can propagate to the subsequent model being trained. Unfortunately, these generative models are not easily interpretable and are therefore difficult to debug for users. To address this, we present our vision for Flipper, a framework that presents users with high-level information about why their training set is inaccurate and informs their decisions as they improve their generative model manually. We present potential tools within the Flipper framework, inspired by observing biomedical experts working with generative models, which allow users to analyze the errors in their training data in a systematic fashion. Finally, we discuss a prototype of Flipper and report results of a user study where users create a training set for a classification task and improve the discriminative model's accuracy by 2.4 points in less than an hour with feedback from Flipper.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76446334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信