SURF:通过向繁忙和嘈杂的最终用户学习来改进生产中的分类器

Proceedings of the First ACM International Conference on AI in Finance Pub Date : 2020-10-12 DOI:10.1145/3383455.3422547

J. Lockhart, Samuel A. Assefa, Ayham Alajdad, Andrew Alexander, T. Balch, M. Veloso

{"title":"SURF:通过向繁忙和嘈杂的最终用户学习来改进生产中的分类器","authors":"J. Lockhart, Samuel A. Assefa, Ayham Alajdad, Andrew Alexander, T. Balch, M. Veloso","doi":"10.1145/3383455.3422547","DOIUrl":null,"url":null,"abstract":"Supervised learning classifiers inevitably make mistakes in production, perhaps mis-labeling an email, or flagging an otherwise routine transaction as fraudulent. It is vital that the end users of such a system are provided with a means of relabeling data points that they deem to have been mislabeled. The classifier can then be retrained on the relabeled data points in the hope of performance improvement. To reduce noise in this feedback data, well known algorithms from the crowdsourcing literature can be employed. However, the feedback setting provides a new challenge: how do we know what to do in the case of user non-response? If a user provides us with no feedback on a label then it can be dangerous to assume they implicitly agree: a user can be busy, lazy, or no longer a user of the system! We show that conventional crowdsourcing algorithms struggle in this user feedback setting, and present a new algorithm, SURF, that can cope with this non-response ambiguity.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"563","resultStr":"{\"title\":\"SURF: improving classifiers in production by learning from busy and noisy end users\",\"authors\":\"J. Lockhart, Samuel A. Assefa, Ayham Alajdad, Andrew Alexander, T. Balch, M. Veloso\",\"doi\":\"10.1145/3383455.3422547\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Supervised learning classifiers inevitably make mistakes in production, perhaps mis-labeling an email, or flagging an otherwise routine transaction as fraudulent. It is vital that the end users of such a system are provided with a means of relabeling data points that they deem to have been mislabeled. The classifier can then be retrained on the relabeled data points in the hope of performance improvement. To reduce noise in this feedback data, well known algorithms from the crowdsourcing literature can be employed. However, the feedback setting provides a new challenge: how do we know what to do in the case of user non-response? If a user provides us with no feedback on a label then it can be dangerous to assume they implicitly agree: a user can be busy, lazy, or no longer a user of the system! We show that conventional crowdsourcing algorithms struggle in this user feedback setting, and present a new algorithm, SURF, that can cope with this non-response ambiguity.\",\"PeriodicalId\":447950,\"journal\":{\"name\":\"Proceedings of the First ACM International Conference on AI in Finance\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"563\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the First ACM International Conference on AI in Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3383455.3422547\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3383455.3422547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 563

摘要

监督学习分类器在生产中不可避免地会犯错误，可能会错误地标记电子邮件，或者将其他常规交易标记为欺诈。至关重要的是，为这种系统的最终用户提供一种方法来重新标记他们认为已被错误标记的数据点。然后，分类器可以在重新标记的数据点上进行重新训练，以期提高性能。为了减少反馈数据中的噪声，可以使用众包文献中众所周知的算法。然而，反馈设置提供了一个新的挑战:我们如何知道在用户没有响应的情况下该怎么做?如果用户没有向我们提供任何关于标签的反馈，那么假设他们暗中同意:用户可能很忙，很懒，或者不再是系统的用户，这是很危险的!我们展示了传统的众包算法在这种用户反馈设置中挣扎，并提出了一种新的算法SURF，可以处理这种非响应歧义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SURF: improving classifiers in production by learning from busy and noisy end users

Supervised learning classifiers inevitably make mistakes in production, perhaps mis-labeling an email, or flagging an otherwise routine transaction as fraudulent. It is vital that the end users of such a system are provided with a means of relabeling data points that they deem to have been mislabeled. The classifier can then be retrained on the relabeled data points in the hope of performance improvement. To reduce noise in this feedback data, well known algorithms from the crowdsourcing literature can be employed. However, the feedback setting provides a new challenge: how do we know what to do in the case of user non-response? If a user provides us with no feedback on a label then it can be dangerous to assume they implicitly agree: a user can be busy, lazy, or no longer a user of the system! We show that conventional crowdsourcing algorithms struggle in this user feedback setting, and present a new algorithm, SURF, that can cope with this non-response ambiguity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the First ACM International Conference on AI in Finance

自引率

0.00%

发文量