Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)最新文献_第6页

What Users Don't Expect about Exploratory Data Analysis on Approximate Query Processing Systems 关于近似查询处理系统的探索性数据分析，用户不期望的是什么

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077258

Dominik Moritz, Danyel Fisher

引用次数: 7

Observation-Level Interaction with Clustering and Dimension Reduction Algorithms 与聚类和降维算法的观测级交互

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077259

John E. Wenskovitch, Chris North

引用次数: 38

Observing the Data Scientist: Using Manual Corrections As Implicit Feedback 观察数据科学家:使用人工修正作为隐式反馈

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077272

Nurzety A. Azuan, Suzanne M. Embury, N. Paton

{"title":"Observing the Data Scientist: Using Manual Corrections As Implicit Feedback","authors":"Nurzety A. Azuan, Suzanne M. Embury, N. Paton","doi":"10.1145/3077257.3077272","DOIUrl":"https://doi.org/10.1145/3077257.3077272","url":null,"abstract":"Dataspaces aim to remove the up-front costs of information integration by gathering the needed domain information through targeted interactions with the end-user throughout the life-time of the integration. State-of-the-art tools are used to rapidly construct an initial (incorrect) integration, which is then refined in a pay-as-you-go manner by asking end-users to supply feedback on the resulting data. The idea is that end-users will choose to put effort into providing feedback on the areas of the integration where the quality is important to them, while other less well-used areas will receive a smaller share of user attention. This approach is promising but open problems remain. One issue is that the end-user loses control over the process. Their contribution is to specify their query requirements and to provide feedback on the results, as directed by the dataspace. But what feedback should the user supply to get the data they want? We propose a new approach to data integration in which the end-user and the dataspace work as equal partners to meet the integration goal. Both are able to perform data integration tasks directly, and both request and provide feedback on the results. In addition, the dataspace observes the actions of the end-user when carrying out integration, with the aim of automating that part of the work in future integration tasks. In this paper, we explore this idea by examining how a dataspace can observe an end-user at work, correcting errors in query results, to gather feedback needed to refine the mappings used for integration. We propose an algorithm for converting manual corrections to feedback, and present the results of a preliminary evaluation comparing this approach with seeking explicit feedback from end-users.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76962293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Human-in-the-Loop Challenges for Entity Matching: A Midterm Report 实体匹配的人在循环挑战:中期报告

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077268

A. Doan, A. Ardalan, Jeffrey R. Ballard, Sanjib Das, Yash Govind, Pradap Konda, Han Li, Sidharth Mudgal, Erik Paulson, C. PaulSuganthanG., Haojun Zhang

引用次数: 26

A Game-theoretic Approach to Data Interaction: A Progress Report 数据交互的博弈论方法:进展报告

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077270

Ben McCamish, Arash Termehchy, B. Touri

引用次数: 1

What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration 你看到的不是你得到的!在数据探索中发现辛普森悖论

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077266

Yue (Sophie) Guo, Carsten Binnig, Tim Kraska

{"title":"What you see is not what you get!: Detecting Simpson's Paradoxes during Data Exploration","authors":"Yue (Sophie) Guo, Carsten Binnig, Tim Kraska","doi":"10.1145/3077257.3077266","DOIUrl":"https://doi.org/10.1145/3077257.3077266","url":null,"abstract":"Visual data exploration tools, such as Vizdom or Tableau, significantly simplify data exploration for domain experts and, more importantly, novice users. These tools allow to discover complex correlations and to test hypotheses and differences between various populations in an entirely visual manner with just a few clicks, unfortunately, often ignoring even the most basic statistical rules. For example, there are many statistical pitfalls that a user can \"tap\" into when exploring data sets. As a result of this experience, we started to build QUDE [1], the first system to Quantifying the Uncertainty in Data Exploration, which is part of Brown's Interactive Data Exploration Stack (called IDES). The goal of QUDE is to automatically warn and, if possible, protect users from common mistakes during the data exploration process. In this paper, we focus on a different type of error, the Simpson's Paradox, which is a special type of error in which a high-level aggregate/visualization leads to the wrong conclusion since a trend reverts when splitting the visualized data set into multiple subgroups (i.e., when executing a drill-down)..","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81264473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision 对破碎的机器学习抽象的过度反应:轻松。毫升的愿景

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077265

Ce Zhang, Wentao Wu, Tian Li

{"title":"An Overreaction to the Broken Machine Learning Abstraction: The ease.ml Vision","authors":"Ce Zhang, Wentao Wu, Tian Li","doi":"10.1145/3077257.3077265","DOIUrl":"https://doi.org/10.1145/3077257.3077265","url":null,"abstract":"After hours of teaching astrophysicists TensorFlow and then see them, nevertheless, continue to struggle in the most creative way possible, we asked, What is the point of all of these efforts? It was a warm winter afternoon, Zurich was not gloomy at all; while Seattle was sunny as usual, and Beijing's air was crystally clear. One of the authors stormed out of a Marathon meeting with biologists, and our journey of overreaction begins. We ask, Can we build a system that gets domain experts completely out of the machine learning loop? Can this system have exactly the same interface as linear regression, the bare minimum requirement of a scientist? We started trial-and-errors and discussions with domain experts, all of whom not only have a great sense of humor but also generously offered to be our \"guinea pigs.\" After months of exploration the architecture of our system, ease.ml, starts to get into shape---It is not as general as TensorFlow but not completely useless; in fact, many applications we are supporting can be built completely with ease.ml, and many others just need some syntax sugars. During development, we find that building ease.ml in the right way raises a series of technical challenges. In this paper, we describe our ease.ml vision, discuss each of these technical challenges, and map out our research agenda for the months and years to come.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84268976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Assisting Discovery in Public Health 协助公共卫生发现

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077269

Yannis Katsis, N. Koulouris, Y. Papakonstantinou, K. Patrick

{"title":"Assisting Discovery in Public Health","authors":"Yannis Katsis, N. Koulouris, Y. Papakonstantinou, K. Patrick","doi":"10.1145/3077257.3077269","DOIUrl":"https://doi.org/10.1145/3077257.3077269","url":null,"abstract":"Several public health (PH) researchers have lately been arguing that big data can play a profound role in scientific discovery. Leveraging the vast amount of population-level data collected by public agencies and other organizations, could lead to important discoveries that were not necessarily suspected to be true. However, they also warn about the pitfalls of data-driven discovery: The large amount of data can easily lead to information overload for the researchers. Additionally, data-driven studies that make a lot of tests in the search for important discoveries have the potential to lead to discoveries that seem important but are in fact random. We show that data-driven studies can be effective and yet avoid the potential pitfalls by keeping the researchers in the loop of the discovery process. To this end, we propose PHD; an interactive visual discovery system that allows public health researchers to gain interesting insights from large datasets. PHD generalizes the current workflow of PH researchers by facilitating the major analytics tasks involved in PH discovery, such as calculating important associations based on the standard notions of odds rations and confidence intervals, controlling for the effect of other variables and discovering interesting compounding effects. More importantly however, it leverages user interaction and the semantics of the domain to make sure that this workflow scales to large datasets, while avoiding information overload and random discoveries.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83667274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

SpeakQL: Towards Speech-driven Multi-modal Querying SpeakQL:迈向语音驱动的多模态查询

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077264

Dharmil Chandarana, Vraj Shah, Arun Kumar, L. Saul

引用次数: 9

Flipper: A Systematic Approach to Debugging Training Sets Flipper:调试训练集的系统方法

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI: 10.1145/3077257.3077263

P. Varma, Dan Iter, Christopher De Sa, C. Ré

{"title":"Flipper: A Systematic Approach to Debugging Training Sets","authors":"P. Varma, Dan Iter, Christopher De Sa, C. Ré","doi":"10.1145/3077257.3077263","DOIUrl":"https://doi.org/10.1145/3077257.3077263","url":null,"abstract":"As machine learning methods gain popularity across different fields, acquiring labeled training datasets has become the primary bottleneck in the machine learning pipeline. Recently generative models have been used to create and label large amounts of training data, albeit noisily. The output of these generative models is then used to train a discriminative model of choice, such as logistic regression or a complex neural network. However, any errors in the generative model can propagate to the subsequent model being trained. Unfortunately, these generative models are not easily interpretable and are therefore difficult to debug for users. To address this, we present our vision for Flipper, a framework that presents users with high-level information about why their training set is inaccurate and informs their decisions as they improve their generative model manually. We present potential tools within the Flipper framework, inspired by observing biomedical experts working with generative models, which allow users to analyze the errors in their training data in a systematic fashion. Finally, we discuss a prototype of Flipper and report results of a user study where users create a training set for a classification task and improve the discriminative model's accuracy by 2.4 points in less than an hour with feedback from Flipper.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76446334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24