{"title":"ProvDB: Lifecycle Management of Collaborative Analysis Workflows","authors":"Hui Miao, Amit Chavan, A. Deshpande","doi":"10.1145/3077257.3077267","DOIUrl":"https://doi.org/10.1145/3077257.3077267","url":null,"abstract":"As data-driven methods are becoming pervasive in a wide variety of disciplines, there is an urgent need to develop scalable and sustainable tools to simplify the process of data science, to make it easier for the users to keep track of the analyses being performed and datasets being generated, and to enable the users to understand and analyze the workflows. In this paper, we describe our vision of a unified provenance and metadata management system to support lifecycle management of complex collaborative data science workflows. We argue that the information about the analysis processes and data artifacts can, and should be, captured in a semi-passive manner; and we show that querying and analyzing this information can not only simplify bookkeeping and debugging tasks but also enable a rich new set of capabilities like identifying flaws in the data science process itself. It can also significantly reduce the user time spent in fixing post-deployment problems through automated analysis and monitoring. We have implemented a prototype system, PROVDB, on top of git and Neo4j, and we describe its key features and capabilities.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73863980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PALM: Machine Learning Explanations For Iterative Debugging","authors":"S. Krishnan, Eugene Wu","doi":"10.1145/3077257.3077271","DOIUrl":"https://doi.org/10.1145/3077257.3077271","url":null,"abstract":"When a Deep Neural Network makes a misprediction, it can be challenging for a developer to understand why. While there are many models for interpretability in terms of predictive features, it may be more natural to isolate a small set of training examples that have the greatest influence on the prediction. However, it is often the case that every training example contributes to a prediction in some way but with varying degrees of responsibility. We present Partition Aware Local Model (PALM), which is a tool that learns and summarizes this responsibility structure to aide machine learning debugging. PALM approximates a complex model (e.g., a deep neural network) using a two-part surrogate model: a meta-model that partitions the training data, and a set of sub-models that approximate the patterns within each partition. These sub-models can be arbitrarily complex to capture intricate local patterns. However, the meta-model is constrained to be a decision tree. This way the user can examine the structure of the meta-model, determine whether the rules match intuition, and link problematic test examples to responsible training data efficiently. Queries to PALM are nearly 30x faster than nearest neighbor queries for identifying relevant data, which is a key property for interactive applications.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85479335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precision Interfaces","authors":"Haoci Zhang, Thibault Sellam, Eugene Wu","doi":"10.1145/3077257.3077261","DOIUrl":"https://doi.org/10.1145/3077257.3077261","url":null,"abstract":"Building interactive tools to support data analysis is hard because it is not always clear what to build and how to build it. To address this problem, we present Precision Interfaces, a semi-automatic system to generate task-specific data analytics interfaces. Precision Interface can turn a log of executed programs into an interface, by identifying micro-variations between the programs and mapping them to interface components. This paper focuses on SQL query logs, but we can generalize the approach to other languages. Our system operates in two steps: it first build an interaction graph, which describes how the queries can be transformed into each other. Then, it finds a set of UI components that covers a maximal number of transformations. To restrict the domain of changes to be detected, our system uses a domain-specific language, PILang. We give a full description of Precision Interface's components, showcase an early prototype on real program logs and discuss future research opportunities.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"123 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72612724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandr A Kalinin, Selvam Palanimalai, Ivo D Dinov
{"title":"SOCRAT Platform Design: A Web Architecture for Interactive Visual Analytics Applications.","authors":"Alexandr A Kalinin, Selvam Palanimalai, Ivo D Dinov","doi":"10.1145/3077257.3077262","DOIUrl":"https://doi.org/10.1145/3077257.3077262","url":null,"abstract":"<p><p>The modern web is a successful platform for large scale interactive web applications, including visualizations. However, there are no established design principles for building complex visual analytics (VA) web applications that could efficiently integrate visualizations with data management, computational transformation, hypothesis testing, and knowledge discovery. This imposes a time-consuming design and development process on many researchers and developers. To address these challenges, we consider the design requirements for the development of a module-based VA system architecture, adopting existing practices of large scale web application development. We present the preliminary design and implementation of an open-source platform for Statistics Online Computational Resource Analytical Toolbox (SOCRAT). This platform defines: (1) a specification for an architecture for building VA applications with multi-level modularity, and (2) methods for optimizing module interaction, re-usage, and extension. To demonstrate how this platform can be used to integrate a number of data management, interactive visualization, and analysis tools, we implement an example application for simple VA tasks including raw data input and representation, interactive visualization and analysis.</p>","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"2017 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3077257.3077262","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35987683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics","authors":"","doi":"10.1145/3077257","DOIUrl":"https://doi.org/10.1145/3077257","url":null,"abstract":"","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87035863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the Workshop on Human-In-the-Loop Data Analytics","authors":"Carsten Binnig, A. Fekete, Arnab Nandi","doi":"10.1145/3209900","DOIUrl":"https://doi.org/10.1145/3209900","url":null,"abstract":"We are delighted to present the papers from the first HILDA Workshop on Human-in-the-Loop Data Analytics, which took place on 26 June, 2016 co-located with the ACM SIGMOD conference in San Francisco, California, USA. \u0000 \u0000An often overlooked component when designing a data management system is the set of users interacting with it { the human-in-the-loop. A major bottleneck in data analytics today is to efficiently leverage the human capabilities to formulate questions and understand answers of data analytics systems. In the database community the first vision papers towards that direction discussed topics such as interactive data exploration or querying data management systems with more intuitive user interfaces. Recent technology trends (such as touch screens, motion detection, and voice recognition) are widening the possibilities for users to interact with data, and data-driven industries are shifting to personalized processing to better target their services to the users' needs. \u0000 \u0000We created HILDA to broaden the scope of how data analytics can be done with the awareness of people. A major goal of HILDA was to bring together researchers and practitioners from the database community with various other communities such as the HCI community or the Information Visualization community. We sought papers that propose innovations to improve the way people can work with data management systems, and also work that studies how humans work with existing systems. We explicitly asked for submissions that present initial ideas and visions, just as much as reports on early results, or reflections on completed projects.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91121649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the Workshop on Human-In-the-Loop Data Analytics","authors":"Carsten Binnig, A. Fekete, Arnab Nandi","doi":"10.1145/2939502","DOIUrl":"https://doi.org/10.1145/2939502","url":null,"abstract":"","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87896049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}