Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)最新文献

筛选
英文 中文
Towards an End-to-End Human-Centric Data Cleaning Framework 迈向端到端以人为中心的数据清理框架
E. Rezig, M. Ouzzani, A. Elmagarmid, Walid G. Aref, M. Stonebraker
{"title":"Towards an End-to-End Human-Centric Data Cleaning Framework","authors":"E. Rezig, M. Ouzzani, A. Elmagarmid, Walid G. Aref, M. Stonebraker","doi":"10.1145/3328519.3329133","DOIUrl":"https://doi.org/10.1145/3328519.3329133","url":null,"abstract":"Data Cleaning refers to the process of detecting and fixing errors in the data. Human involvement is instrumental at several stages of this process such as providing rules or validating computed repairs. There is a plethora of data cleaning algorithms addressing a wide range of data errors (e.g., detecting duplicates, violations of integrity constraints, and missing values). Many of these algorithms involve a human in the loop, however, this latter is usually coupled to the underlying cleaning algorithms. In a real data cleaning pipeline, several data cleaning operations are performed using different tools. A high-level reasoning on these tools, when combined to repair the data, has the potential to unlock useful use cases to involve humans in the cleaning process. Additionally, we believe there is an opportunity to benefit from recent advances in active learning methods to minimize the effort humans have to spend to verify data items produced by tools or humans. There is currently no end-to-end data cleaning framework that systematically involves humans in the cleaning pipeline regardless of the underlying cleaning algorithms. In this paper, we present opportunities that this framework could offer, and highlight key challenges that need to be addressed to realize this vision. We present a design vision and discuss scenarios that motivate the need for this framework to judiciously assist humans in the cleaning process.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75913661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
UserDEV: A Mixed-Initiative System for User Group Analytics UserDEV:用于用户组分析的混合主动系统
Behrooz Omidvar-Tehrani, S. Amer-Yahia, Eric Simon, Fabian Colque Zegarra, J. Comba, V. Moreira
{"title":"UserDEV: A Mixed-Initiative System for User Group Analytics","authors":"Behrooz Omidvar-Tehrani, S. Amer-Yahia, Eric Simon, Fabian Colque Zegarra, J. Comba, V. Moreira","doi":"10.1145/3328519.3329128","DOIUrl":"https://doi.org/10.1145/3328519.3329128","url":null,"abstract":"The increasing availability of user data constitutes new opportunities in various applications ranging from behavioral analytics to recommendations. A common way of analyzing user data is through \"user group analytics\" whose purpose is to breakdown users into groups to gain a more focused understanding of their collective behavior. The process consists of group discovery, group exploration, and group visualization. To date, user group analytics is done using separate tools which makes it fragmented and burdensome for analysts. In this paper, we describe UserDEV, a full-fledged user group analytics pipeline which combines discovery, exploration, and visualization of user groups, in a fully-connected fashion. UserDEV contributes a star-like architecture as well as a common data exchange model to tighten connections between the analytics components. We provide a realistic use case to show how UserDEV helps analysts perform analytical tasks on user groups. While we report a preliminary user study, we also discuss opportunities for an end-to-end evaluation of a group analytics framework.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"2018 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78245209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Session details: Human-in-the-loop Learning 会议详情:Human-in-the-loop Learning
R. Lenz
{"title":"Session details: Human-in-the-loop Learning","authors":"R. Lenz","doi":"10.1145/3359611","DOIUrl":"https://doi.org/10.1145/3359611","url":null,"abstract":"","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80249523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explaining Entity Resolution Predictions: Where are we and What needs to be done? 解释实体解析预测:我们在哪里,需要做什么?
Saravanan Thirumuruganathan, M. Ouzzani, N. Tang
{"title":"Explaining Entity Resolution Predictions: Where are we and What needs to be done?","authors":"Saravanan Thirumuruganathan, M. Ouzzani, N. Tang","doi":"10.1145/3328519.3329130","DOIUrl":"https://doi.org/10.1145/3328519.3329130","url":null,"abstract":"Entity resolution (ER) seeks to identify the set of tuples in a dataset that refer to the same real-world entity. It is one of the fundamental and well studied problems in data integration with applications in diverse domains such as banking, insurance, e-commerce, and so on. Machine Learning and Deep Learning based methods provide the state-of-the-art results. For practitioners, it is often challenging to understand why the classifier made a particular prediction. While there has been extensive work in the ML community on explaining classifier predictions, we found that a direct application of those techniques is not appropriate for ER. There is a huge gap between the needs of lay ER practitioners and the explanation community. In this paper, we provide a comprehensive taxonomy of these challenges, discuss research opportunities and propose preliminary solutions.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84872984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Session details: Text, Graphs, and Groups 会话详细信息:文本、图形和分组
Carsten Binnig
{"title":"Session details: Text, Graphs, and Groups","authors":"Carsten Binnig","doi":"10.1145/3359612","DOIUrl":"https://doi.org/10.1145/3359612","url":null,"abstract":"","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"146 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77655045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visus: An Interactive System for Automatic Machine Learning Model Building and Curation Visus:一个用于自动机器学习模型构建和管理的交互式系统
Aécio Santos, Sonia Castelo, Cristian Felix, Jorge Piazentin Ono, Bowen Yu, S. Hong, Cláudio T. Silva, E. Bertini, J. Freire
{"title":"Visus: An Interactive System for Automatic Machine Learning Model Building and Curation","authors":"Aécio Santos, Sonia Castelo, Cristian Felix, Jorge Piazentin Ono, Bowen Yu, S. Hong, Cláudio T. Silva, E. Bertini, J. Freire","doi":"10.1145/3328519.3329134","DOIUrl":"https://doi.org/10.1145/3328519.3329134","url":null,"abstract":"While the demand for machine learning (ML) applications is booming, there is a scarcity of data scientists capable of building such models. Automatic machine learning (AutoML) approaches have been proposed that help with this problem by synthesizing end-to-end ML data processing pipelines. However, these follow a best-effort approach and a user in the loop is necessary to curate and refine the derived pipelines. Since domain experts often have little or no expertise in machine learning, easy-to-use interactive interfaces that guide them throughout the model building process are necessary. In this paper, we present Visus, a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. We describe the framework used to ground our design choices and a usage scenario enabled by Visus. Finally, we discuss the feedback received in user testing sessions with domain experts.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86989002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Towards a Unified Representation of Insight in Human-in-the-Loop Analytics: A User Study 面向人在循环分析中洞察力的统一表示:一项用户研究
Eser Kandogan, U. Engelke
{"title":"Towards a Unified Representation of Insight in Human-in-the-Loop Analytics: A User Study","authors":"Eser Kandogan, U. Engelke","doi":"10.1145/3209900.3209912","DOIUrl":"https://doi.org/10.1145/3209900.3209912","url":null,"abstract":"Understanding what insights people draw from data visualizations is critical for human-in-the loop analytics systems to facilitate mixed-initiative analysis. In this paper we present results from a large user study on insights extracted from commonly used charts. We report several patterns of insights we observed and analyze their semantic structure to identify key considerations towards a unified formal representation of insight, human or computer generated. We also present a model of insight generation process, where humans and computers work cooperatively, building on each other's knowledge, where a common representation acts as the currency of interaction. While not going as far as proposing a formalism, we point to a few potential directions for representing insight. We believe our findings could also inform the design of novel human-in-the-loop analytics systems.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85025503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Source Selection Languages: A Usability Evaluation 源代码选择语言:可用性评估
I. Galpin, Edward Abel, N. Paton
{"title":"Source Selection Languages: A Usability Evaluation","authors":"I. Galpin, Edward Abel, N. Paton","doi":"10.1145/3209900.3209906","DOIUrl":"https://doi.org/10.1145/3209900.3209906","url":null,"abstract":"When looking to obtain insights from data, and given numerous possible data sources, there are certain quality criteria that retrieved data from selected sources should exhibit so as to be most fit-for-purpose. An effective source selection algorithm can only provide good results in practice if the requirements of the user have been suitably captured, and therefore, an important consideration is how users can effectively express their requirements. In this paper, we carry out an experiment to compare user performance in two different languages for expressing user requirements in terms of data quality characteristics, pairwise comparison of criteria values, and single objective constrained optimization. We employ crowdsourcing to evaluate, for a set of tasks, user ability to choose effective formulations in each language. The results of this initial study show that users were able to determine more effective formulations for the tasks using pairwise comparisons. Furthermore, it was found that users tend to express a preference for one language over the other, although it was not necessarily the language that they performed best in.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86744481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Beaver 海狸
Zhongjun (Mark) Jin, Christopher Baik, Michael J. Cafarella, H. Jagadish
{"title":"Beaver","authors":"Zhongjun (Mark) Jin, Christopher Baik, Michael J. Cafarella, H. Jagadish","doi":"10.1145/3209900.3209902","DOIUrl":"https://doi.org/10.1145/3209900.3209902","url":null,"abstract":"Schema mapping is used to transform data to a desired schema from data sources with different schemas. Manually writing complete schema mapping specifications requires a deep understanding of the source and target schemas, which can be burdensome for the user. Programming By Example (PBE) schema mapping methods allow the user to describe the schema mapping using data records. However, real data records are still harder to specify compared to other useful insights about the desired schema mapping the user might have. In this project, we develop a new schema mapping technique, Beaver, that enables an interaction model that gives the user more flexibility in describing the desired schema mapping. The end user is not limited to providing exact and complete target schema data examples but may also provide incomplete or ambiguous examples. Moreover, the user can provide other types of descriptions, like data type or value range, about the target schema. We design an explore-and-verify search-based algorithm to efficiently discover all satisfying schema mapping specifications. We implemented a prototype of our schema mapping technique and experimentally evaluated the efficiency of the system in handling traditional PBE schema mapping test cases, as well as our newly-proposed declarative schema mapping test cases. The experiment results show that the declarative queries, which we believe are easier for non-expert user to input, often cost around zero to five seconds more than the traditional PBE queries. This suggests we retain a system efficiency comparable to traditional PBE schema mapping systems.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73841786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
ViDeTTe Interactive Notebooks ViDeTTe互动笔记本
Konstantinos Zarifis, Y. Papakonstantinou
{"title":"ViDeTTe Interactive Notebooks","authors":"Konstantinos Zarifis, Y. Papakonstantinou","doi":"10.1145/3209900.3209907","DOIUrl":"https://doi.org/10.1145/3209900.3209907","url":null,"abstract":"Interactive notebooks allow the use of popular languages, such as python, for composing data analytics projects. The interface they provide, enables data scientists to import data, analyze them and compose the results into easily readable report-like web pages, that can contain re-runnable code, visualizations and textual description of the entire process, all in one place. Scientists can then share such pages with other users in order to present their findings, collaborate and further explore the underlying data. However, as we show in this work, interactive notebooks lack in interactivity for the reader of the resulting notebook. Users can rerun or extend the code included in a notebook but cannot directly interact with the generated visualizations in order to trigger additional computation and further explore the underlying data. This means that only code-literate readers can further interact with and extend such notebooks, while the rest can only passively read the provided report. This comes in stark contrast to OLAP data cube interfaces, which utilize user interaction to trigger additional data exploratory capabilities. Adding OLAP-like reactive functionality in notebooks further increases the required technical expertise as event-driven logic has to be added by the data analyst. To address these issues, we propose ViDeTTe1, an engine that enhances notebooks with capabilities that benefit both data scientists and non-technical notebook readers. ViDeTTe uses a declarative language that simplifies data retrieval and data visualization for analysts. The generated visualizations are capable of collecting the reader's input and reacting to it. As the user interacts with the visualizations, ViDeTTe identifies subsequent parts of the notebook that depend on the user's input, causes reevaluation of the affected computations and propagates changes to the visualization units. By doing this, ViDeTTe offers enhanced data exploratory capabilities to readers, without requiring any coding skills, while at the same time lowering the technical expertise needed for the development of reactive notebooks.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90547862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信