Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)最新文献

筛选
英文 中文
Introducing quest: a query-driven framework to explain classification models on tabular data 引入quest:一个查询驱动的框架,用于解释表格数据上的分类模型
Nadja Geisler, Carsten Binnig
{"title":"Introducing quest: a query-driven framework to explain classification models on tabular data","authors":"Nadja Geisler, Carsten Binnig","doi":"10.1145/3546930.3547497","DOIUrl":"https://doi.org/10.1145/3546930.3547497","url":null,"abstract":"Machine learning models are everywhere now; but only few of them are transparent in how they work. To remedy this, local explanations aim to show users how and why learned models produce a certain output for a given input (data sample). However, most existing approaches for are oriented around images or text data and, thus, cannot leverage the structure and properties of tabular data. Therefore, we present Quest, a new framework for generating explanations that are a better fit for tabular data. The main idea is to create explanations in the form of relational predicates (called queries hereafter) that approximate the behavior of a classifier around the given sample. In an initial evaluation, we show anecdotally how Quest can be used on a tabular data set compared to existing approaches that can be applied on tabular data.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"145 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75078077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Another way to implement complex computations: functional-style SQL UDF 实现复杂计算的另一种方法是:函数式SQL UDF
C. Duta
{"title":"Another way to implement complex computations: functional-style SQL UDF","authors":"C. Duta","doi":"10.1145/3546930.3547508","DOIUrl":"https://doi.org/10.1145/3546930.3547508","url":null,"abstract":"Whenever data-intensive computation gets so complex that it requires the use of iteration or recursion, SQL developers turn towards recursive common table expressions (CTEs). We present the results of a user study that shows how developers struggle with the unusual fixpoint semantics and awkward monolithic syntactic structure of CTEs. The study suggests that recursive user-defined functions (UDFs)---written in a style much like regular functional programs---are less prone to errors, significantly more readable, and can be authored more quickly. Since such recursive UDFs can be automatically compiled into efficiently executable CTEs, we put functional-style UDFs forward as another promising pillar to express complex computation close to the data.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89081064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploratory training: when trainers learn 探索性培训:当培训师学习时
Omeed Habibelahian, R. Shrestha, Arash Termehchy, Paolo Papotti
{"title":"Exploratory training: when trainers learn","authors":"Omeed Habibelahian, R. Shrestha, Arash Termehchy, Paolo Papotti","doi":"10.1145/3546930.3547500","DOIUrl":"https://doi.org/10.1145/3546930.3547500","url":null,"abstract":"Data systems often present examples and solicit labels from users to learn a target concept in supervised to semi-supervised learning. This selection of examples could be even done in an active fashion i.e., active learning. Current systems assume that users always provide correct labeling with potentially a fixed and small chance of mistake. In several settings, users may have to explore and learn about the underlying data to label examples correctly, particularly for complex target concepts and models. For example, to provide accurate labeling for a model of detecting noisy or abnormal values, users might need to investigate the underlying data to understand typical and clean values in the data. As users gradually learn about the target concept and data, they may revise their labeling strategies. Due to the significance and non-stationarity of errors in this setting, current systems may use incorrect labels and learn inaccurate models from the users. We report preliminary results for a user study over real-world datasets on modeling human learning during training the system and layout the next steps in this investigation.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73301170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context sight: model understanding and debugging via interpretable context 上下文视图:通过可解释的上下文来理解和调试模型
Jun Yuan, E. Bertini
{"title":"Context sight: model understanding and debugging via interpretable context","authors":"Jun Yuan, E. Bertini","doi":"10.1145/3546930.3547502","DOIUrl":"https://doi.org/10.1145/3546930.3547502","url":null,"abstract":"Model interpretation is increasingly important for successful model development and deployment. In recent years, many explanation methods are introduced to help humans understand how a machine learning model makes a decision on a specific instance. Recent studies show that contextualizing an individual model decision within a set of relevant examples can improve the model understanding. However, there is a lack of systematic study on what factors are considered when generating and using the context examples to explain model predictions, and how context examples help with model understanding and debugging in practice. In this work, we first identify a taxonomy of context generation and summarization through literature review. We then present Context Sight, a visual analytics system that integrates customized context generation and multiple-level context summarization to assist context exploration and interpretation. We evaluate the usefulness of the system through a detailed use case. This work is an initial step for a set of systematic research on how contextualization can help data scientists and practitioners understand and diagnose model behaviors, based on which we will gain a better understanding of the usage of context.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75467455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data 学习验证黑箱机器学习模型对未知数据的预测
S. Redyuk, Sebastian Schelter, Tammo Rukat, V. Markl, F. Biessmann
{"title":"Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data","authors":"S. Redyuk, Sebastian Schelter, Tammo Rukat, V. Markl, F. Biessmann","doi":"10.1145/3328519.3329126","DOIUrl":"https://doi.org/10.1145/3328519.3329126","url":null,"abstract":"When end users apply a machine learning (ML) model on new unlabeled data, it is difficult for them to decide whether they can trust its predictions. Errors or shifts in the target data can lead to hard-to-detect drops in the predictive quality of the model. We therefore propose an approach to assist non-ML experts working with pretrained ML models. Our approach estimates the change in prediction performance of a model on unseen target data. It does not require explicit distributional assumptions on the dataset shift between the training and target data. Instead, a domain expert can declaratively specify typical cases of dataset shift that she expects to observe in real-world data. Based on this information, we learn a performance predictor for pretrained black box models, which can be combined with the model, and automatically warns end users in case of unexpected performance drops. We demonstrate the effectiveness of our approach on two models -- logistic regression and a neural network, applied to several real-world datasets.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72841143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Collaborative Framework for Structure Identification over Print Documents 基于打印文档的结构识别协同框架
Maeda F. Hanafi, M. Mannino, A. Abouzeid
{"title":"A Collaborative Framework for Structure Identification over Print Documents","authors":"Maeda F. Hanafi, M. Mannino, A. Abouzeid","doi":"10.1145/3328519.3329131","DOIUrl":"https://doi.org/10.1145/3328519.3329131","url":null,"abstract":"We describe Texture, a framework for data extraction over print documents that allows end-users to construct data extraction rules over an inferred document structure. To effectively infer this structure, we enable developers to contribute multiple heuristics that identify different structures in English print documents, crowd-workers and annotators to manually label these structures, and end-users to search and decide which heuristics to apply and how to boost their performance with the help of ground-truth data collected from crowd-workers and annotators. Texture's design supports each of these different user groups through a suite of tools. We demonstrate that even with a handful of student-developed heuristics, we can achieve reasonable precision and recall when identifying structures across different document collections.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84129769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge Graph Programming with a Human-in-the-Loop: Preliminary Results 知识图编程与人在循环:初步结果
Yuze Lou, Mahfus Uddin, Noam Brown, Michael J. Cafarella
{"title":"Knowledge Graph Programming with a Human-in-the-Loop: Preliminary Results","authors":"Yuze Lou, Mahfus Uddin, Noam Brown, Michael J. Cafarella","doi":"10.1145/3328519.3329132","DOIUrl":"https://doi.org/10.1145/3328519.3329132","url":null,"abstract":"In this paper we introduce knowledge graph programming, a new method for writing extremely succinct programs. This method allows programmers to save work by writing programs that are brief but also underspecified and underconstrained; a human-in-the-loop \"data compiler\" then automatically fills in missing values without the programmer's explicit help. It uses modern data quality mechanisms such as information extraction, data integration, and crowdsourcing. The language encourages users to mention knowledge graph entities in their programs, thus enabling the data compiler to exploit the extensive factual and type structure present in modern KGs. We describe the knowledge graph programming user experience, explain its conceptual steps and data model, describe our prototype KGP system, and present some preliminary experimental results.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83158526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Effective and Efficient Data Cleaning for Entity Matching 针对实体匹配的高效数据清洗
J. Ao, Rada Y. Chirkova
{"title":"Effective and Efficient Data Cleaning for Entity Matching","authors":"J. Ao, Rada Y. Chirkova","doi":"10.1145/3328519.3329127","DOIUrl":"https://doi.org/10.1145/3328519.3329127","url":null,"abstract":"As a key data-integration step, entity matching (EM) identifies tuples referring to the same real-world entities in disparate data sources. In many cases, the EM quality can be improved by repairing incorrect values in the data; at the same time, it is well known that the time costs of data cleaning by human experts could be prohibitive. In this paper, we focus on the time-consuming human-in-the-loop data-cleaning problem for relational EM, by recommending to human experts a time-efficient order in which values of attributes could be cleaned in the given data. Our proposed domain-independent cleaning framework aims to save human users' time, by guiding them in cleaning the EM inputs in an attribute order that is as conducive to maximizing EM accuracy as possible within a given constraint on the time they spend on cleaning. In guiding the cleaning process, our attribute-recommendation methods discover and take advantage of information provided by the data, and also use feedback from the EM engine. Our preliminary experimental results suggest that the proposed approach leads to measurable speedup, for a variety of time constraints, in the improvement of EM accuracy over the baseline approach, in which domain experts choose the sequence in which to clean the attributes of the inputs.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77712753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Data Cleaning and Entity Resolution 会话详细信息:数据清理和实体解析
Thibault Sellam
{"title":"Session details: Data Cleaning and Entity Resolution","authors":"Thibault Sellam","doi":"10.1145/3359610","DOIUrl":"https://doi.org/10.1145/3359610","url":null,"abstract":"","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73357717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Summarization of Large Document Collections 大型文档集合的交互式摘要
Benjamin Hättasch, Christian M. Meyer, Carsten Binnig
{"title":"Interactive Summarization of Large Document Collections","authors":"Benjamin Hättasch, Christian M. Meyer, Carsten Binnig","doi":"10.1145/3328519.3329129","DOIUrl":"https://doi.org/10.1145/3328519.3329129","url":null,"abstract":"We present a new system for custom summarizations of large text corpora at interactive speed. The task of producing textual summaries is an important step to understand large collections of topic-related documents and has many real-world applications in journalism, medicine, and many more. Key to our system is that the summarization model is refined by user feedback and called multiple times to improve the quality of the summaries iteratively. To that end, the human is brought into the loop to gather feedback in every iteration about which aspects of the intermediate summaries satisfy their individual information needs. Our system consists of a sampling component and a learned model to produce a textual summary. As we show in our evaluation, our system can provide a similar quality level as existing summarization models that are working on the full corpus and hence cannot provide interactive speeds.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80028169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信