Flipper: A Systematic Approach to Debugging Training Sets

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.) Pub Date : 2017-05-14 DOI:10.1145/3077257.3077263

P. Varma, Dan Iter, Christopher De Sa, C. Ré

{"title":"Flipper: A Systematic Approach to Debugging Training Sets","authors":"P. Varma, Dan Iter, Christopher De Sa, C. Ré","doi":"10.1145/3077257.3077263","DOIUrl":null,"url":null,"abstract":"As machine learning methods gain popularity across different fields, acquiring labeled training datasets has become the primary bottleneck in the machine learning pipeline. Recently generative models have been used to create and label large amounts of training data, albeit noisily. The output of these generative models is then used to train a discriminative model of choice, such as logistic regression or a complex neural network. However, any errors in the generative model can propagate to the subsequent model being trained. Unfortunately, these generative models are not easily interpretable and are therefore difficult to debug for users. To address this, we present our vision for Flipper, a framework that presents users with high-level information about why their training set is inaccurate and informs their decisions as they improve their generative model manually. We present potential tools within the Flipper framework, inspired by observing biomedical experts working with generative models, which allow users to analyze the errors in their training data in a systematic fashion. Finally, we discuss a prototype of Flipper and report results of a user study where users create a training set for a classification task and improve the discriminative model's accuracy by 2.4 points in less than an hour with feedback from Flipper.","PeriodicalId":92279,"journal":{"name":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3077257.3077263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

As machine learning methods gain popularity across different fields, acquiring labeled training datasets has become the primary bottleneck in the machine learning pipeline. Recently generative models have been used to create and label large amounts of training data, albeit noisily. The output of these generative models is then used to train a discriminative model of choice, such as logistic regression or a complex neural network. However, any errors in the generative model can propagate to the subsequent model being trained. Unfortunately, these generative models are not easily interpretable and are therefore difficult to debug for users. To address this, we present our vision for Flipper, a framework that presents users with high-level information about why their training set is inaccurate and informs their decisions as they improve their generative model manually. We present potential tools within the Flipper framework, inspired by observing biomedical experts working with generative models, which allow users to analyze the errors in their training data in a systematic fashion. Finally, we discuss a prototype of Flipper and report results of a user study where users create a training set for a classification task and improve the discriminative model's accuracy by 2.4 points in less than an hour with feedback from Flipper.

查看原文本刊更多论文

Flipper:调试训练集的系统方法

随着机器学习方法在不同领域的普及，获取标记训练数据集已成为机器学习管道中的主要瓶颈。最近，生成模型已被用于创建和标记大量的训练数据，尽管有噪声。然后，这些生成模型的输出用于训练选择的判别模型，例如逻辑回归或复杂的神经网络。然而，生成模型中的任何错误都可能传播到后续正在训练的模型中。不幸的是，这些生成模型不容易解释，因此很难为用户调试。为了解决这个问题，我们提出了我们对Flipper的愿景，这是一个框架，它向用户提供有关为什么他们的训练集不准确的高级信息，并在他们手动改进生成模型时通知他们的决策。我们在Flipper框架中提出了潜在的工具，灵感来自于观察生物医学专家使用生成模型的工作，它允许用户以系统的方式分析训练数据中的错误。最后，我们讨论了Flipper的原型，并报告了用户研究的结果，用户为分类任务创建了一个训练集，并在不到一个小时的时间内根据Flipper的反馈将判别模型的准确率提高了2.4分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. Workshop on Human-In-the-Loop Data Analytics (2nd : 2017 : Chicago, Ill.)

自引率

0.00%

发文量