Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)最新文献_第4页

Towards an End-to-End Human-Centric Data Cleaning Framework 迈向端到端以人为中心的数据清理框架

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2019-07-05 DOI: 10.1145/3328519.3329133

E. Rezig, M. Ouzzani, A. Elmagarmid, Walid G. Aref, M. Stonebraker

{"title":"Towards an End-to-End Human-Centric Data Cleaning Framework","authors":"E. Rezig, M. Ouzzani, A. Elmagarmid, Walid G. Aref, M. Stonebraker","doi":"10.1145/3328519.3329133","DOIUrl":"https://doi.org/10.1145/3328519.3329133","url":null,"abstract":"Data Cleaning refers to the process of detecting and fixing errors in the data. Human involvement is instrumental at several stages of this process such as providing rules or validating computed repairs. There is a plethora of data cleaning algorithms addressing a wide range of data errors (e.g., detecting duplicates, violations of integrity constraints, and missing values). Many of these algorithms involve a human in the loop, however, this latter is usually coupled to the underlying cleaning algorithms. In a real data cleaning pipeline, several data cleaning operations are performed using different tools. A high-level reasoning on these tools, when combined to repair the data, has the potential to unlock useful use cases to involve humans in the cleaning process. Additionally, we believe there is an opportunity to benefit from recent advances in active learning methods to minimize the effort humans have to spend to verify data items produced by tools or humans. There is currently no end-to-end data cleaning framework that systematically involves humans in the cleaning pipeline regardless of the underlying cleaning algorithms. In this paper, we present opportunities that this framework could offer, and highlight key challenges that need to be addressed to realize this vision. We present a design vision and discuss scenarios that motivate the need for this framework to judiciously assist humans in the cleaning process.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75913661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

UserDEV: A Mixed-Initiative System for User Group Analytics UserDEV:用于用户组分析的混合主动系统

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2019-07-05 DOI: 10.1145/3328519.3329128

Behrooz Omidvar-Tehrani, S. Amer-Yahia, Eric Simon, Fabian Colque Zegarra, J. Comba, V. Moreira

{"title":"UserDEV: A Mixed-Initiative System for User Group Analytics","authors":"Behrooz Omidvar-Tehrani, S. Amer-Yahia, Eric Simon, Fabian Colque Zegarra, J. Comba, V. Moreira","doi":"10.1145/3328519.3329128","DOIUrl":"https://doi.org/10.1145/3328519.3329128","url":null,"abstract":"The increasing availability of user data constitutes new opportunities in various applications ranging from behavioral analytics to recommendations. A common way of analyzing user data is through \"user group analytics\" whose purpose is to breakdown users into groups to gain a more focused understanding of their collective behavior. The process consists of group discovery, group exploration, and group visualization. To date, user group analytics is done using separate tools which makes it fragmented and burdensome for analysts. In this paper, we describe UserDEV, a full-fledged user group analytics pipeline which combines discovery, exploration, and visualization of user groups, in a fully-connected fashion. UserDEV contributes a star-like architecture as well as a common data exchange model to tighten connections between the analytics components. We provide a realistic use case to show how UserDEV helps analysts perform analytical tasks on user groups. While we report a preliminary user study, we also discuss opportunities for an end-to-end evaluation of a group analytics framework.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"2018 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78245209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Session details: Human-in-the-loop Learning 会议详情:Human-in-the-loop Learning

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2019-07-05 DOI: 10.1145/3359611

R. Lenz

引用次数: 0

Explaining Entity Resolution Predictions: Where are we and What needs to be done? 解释实体解析预测:我们在哪里，需要做什么?

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2019-07-05 DOI: 10.1145/3328519.3329130

Saravanan Thirumuruganathan, M. Ouzzani, N. Tang

引用次数: 16

Session details: Text, Graphs, and Groups 会话详细信息:文本、图形和分组

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2019-07-05 DOI: 10.1145/3359612

Carsten Binnig

引用次数: 0

Visus: An Interactive System for Automatic Machine Learning Model Building and Curation Visus:一个用于自动机器学习模型构建和管理的交互式系统

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2019-07-05 DOI: 10.1145/3328519.3329134

Aécio Santos, Sonia Castelo, Cristian Felix, Jorge Piazentin Ono, Bowen Yu, S. Hong, Cláudio T. Silva, E. Bertini, J. Freire

引用次数: 24

Towards a Unified Representation of Insight in Human-in-the-Loop Analytics: A User Study 面向人在循环分析中洞察力的统一表示:一项用户研究

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2018-06-10 DOI: 10.1145/3209900.3209912

Eser Kandogan, U. Engelke

引用次数: 4

Source Selection Languages: A Usability Evaluation 源代码选择语言:可用性评估

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2018-06-10 DOI: 10.1145/3209900.3209906

I. Galpin, Edward Abel, N. Paton

{"title":"Source Selection Languages: A Usability Evaluation","authors":"I. Galpin, Edward Abel, N. Paton","doi":"10.1145/3209900.3209906","DOIUrl":"https://doi.org/10.1145/3209900.3209906","url":null,"abstract":"When looking to obtain insights from data, and given numerous possible data sources, there are certain quality criteria that retrieved data from selected sources should exhibit so as to be most fit-for-purpose. An effective source selection algorithm can only provide good results in practice if the requirements of the user have been suitably captured, and therefore, an important consideration is how users can effectively express their requirements. In this paper, we carry out an experiment to compare user performance in two different languages for expressing user requirements in terms of data quality characteristics, pairwise comparison of criteria values, and single objective constrained optimization. We employ crowdsourcing to evaluate, for a set of tasks, user ability to choose effective formulations in each language. The results of this initial study show that users were able to determine more effective formulations for the tasks using pairwise comparisons. Furthermore, it was found that users tend to express a preference for one language over the other, although it was not necessarily the language that they performed best in.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86744481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Beaver 海狸

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2018-06-10 DOI: 10.1145/3209900.3209902

Zhongjun (Mark) Jin, Christopher Baik, Michael J. Cafarella, H. Jagadish

{"title":"Beaver","authors":"Zhongjun (Mark) Jin, Christopher Baik, Michael J. Cafarella, H. Jagadish","doi":"10.1145/3209900.3209902","DOIUrl":"https://doi.org/10.1145/3209900.3209902","url":null,"abstract":"Schema mapping is used to transform data to a desired schema from data sources with different schemas. Manually writing complete schema mapping specifications requires a deep understanding of the source and target schemas, which can be burdensome for the user. Programming By Example (PBE) schema mapping methods allow the user to describe the schema mapping using data records. However, real data records are still harder to specify compared to other useful insights about the desired schema mapping the user might have. In this project, we develop a new schema mapping technique, Beaver, that enables an interaction model that gives the user more flexibility in describing the desired schema mapping. The end user is not limited to providing exact and complete target schema data examples but may also provide incomplete or ambiguous examples. Moreover, the user can provide other types of descriptions, like data type or value range, about the target schema. We design an explore-and-verify search-based algorithm to efficiently discover all satisfying schema mapping specifications. We implemented a prototype of our schema mapping technique and experimentally evaluated the efficiency of the system in handling traditional PBE schema mapping test cases, as well as our newly-proposed declarative schema mapping test cases. The experiment results show that the declarative queries, which we believe are easier for non-expert user to input, often cost around zero to five seconds more than the traditional PBE queries. This suggests we retain a system efficiency comparable to traditional PBE schema mapping systems.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73841786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

ViDeTTe Interactive Notebooks ViDeTTe互动笔记本

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2018-06-10 DOI: 10.1145/3209900.3209907

Konstantinos Zarifis, Y. Papakonstantinou

{"title":"ViDeTTe Interactive Notebooks","authors":"Konstantinos Zarifis, Y. Papakonstantinou","doi":"10.1145/3209900.3209907","DOIUrl":"https://doi.org/10.1145/3209900.3209907","url":null,"abstract":"Interactive notebooks allow the use of popular languages, such as python, for composing data analytics projects. The interface they provide, enables data scientists to import data, analyze them and compose the results into easily readable report-like web pages, that can contain re-runnable code, visualizations and textual description of the entire process, all in one place. Scientists can then share such pages with other users in order to present their findings, collaborate and further explore the underlying data. However, as we show in this work, interactive notebooks lack in interactivity for the reader of the resulting notebook. Users can rerun or extend the code included in a notebook but cannot directly interact with the generated visualizations in order to trigger additional computation and further explore the underlying data. This means that only code-literate readers can further interact with and extend such notebooks, while the rest can only passively read the provided report. This comes in stark contrast to OLAP data cube interfaces, which utilize user interaction to trigger additional data exploratory capabilities. Adding OLAP-like reactive functionality in notebooks further increases the required technical expertise as event-driven logic has to be added by the data analyst. To address these issues, we propose ViDeTTe1, an engine that enhances notebooks with capabilities that benefit both data scientists and non-technical notebook readers. ViDeTTe uses a declarative language that simplifies data retrieval and data visualization for analysts. The generated visualizations are capable of collecting the reader's input and reacting to it. As the user interacts with the visualizations, ViDeTTe identifies subsequent parts of the notebook that depend on the user's input, causes reevaluation of the affected computations and propagates changes to the visualization units. By doing this, ViDeTTe offers enhanced data exploratory capabilities to readers, without requiring any coding skills, while at the same time lowering the technical expertise needed for the development of reactive notebooks.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90547862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2