机器学习中的Oracle问题和在哪里找到它们

Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops Pub Date : 2020-06-27 DOI:10.1145/3387940.3391490

Cynthia C. S. Liem, Annibale Panichella

{"title":"机器学习中的Oracle问题和在哪里找到它们","authors":"Cynthia C. S. Liem, Annibale Panichella","doi":"10.1145/3387940.3391490","DOIUrl":null,"url":null,"abstract":"The rise in popularity of machine learning (ML), and deep learning in particular, has both led to optimism about achievements of artificial intelligence, as well as concerns about possible weaknesses and vulnerabilities of ML pipelines. Within the software engineering community, this has led to a considerable body of work on ML testing techniques, including white- and black-box testing for ML models. This means the oracle problem needs to be addressed. For supervised ML applications, oracle information is indeed available in the form of dataset 'ground truth', that encodes input data with corresponding desired output labels. However, while ground truth forms a gold standard, there still is no guarantee it is truly correct. Indeed, syntactic, semantic, and conceptual framing issues in the oracle may negatively affect the ML system's integrity. While syntactic issues may automatically be verified and corrected, the higher-level issues traditionally need human judgment and manual analysis. In this paper, we employ two heuristics based on information entropy and semantic analysis on well-known computer vision models and benchmark data from ImageNet. The heuristics are used to semi-automatically uncover potential higher-level issues in (i) the label taxonomy used to define the ground truth oracle (labels), and (ii) data encoding and representation. In doing this, beyond existing ML testing efforts, we illustrate the need for software engineering strategies that especially target and assess the oracle.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Oracle Issues in Machine Learning and Where to Find Them\",\"authors\":\"Cynthia C. S. Liem, Annibale Panichella\",\"doi\":\"10.1145/3387940.3391490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rise in popularity of machine learning (ML), and deep learning in particular, has both led to optimism about achievements of artificial intelligence, as well as concerns about possible weaknesses and vulnerabilities of ML pipelines. Within the software engineering community, this has led to a considerable body of work on ML testing techniques, including white- and black-box testing for ML models. This means the oracle problem needs to be addressed. For supervised ML applications, oracle information is indeed available in the form of dataset 'ground truth', that encodes input data with corresponding desired output labels. However, while ground truth forms a gold standard, there still is no guarantee it is truly correct. Indeed, syntactic, semantic, and conceptual framing issues in the oracle may negatively affect the ML system's integrity. While syntactic issues may automatically be verified and corrected, the higher-level issues traditionally need human judgment and manual analysis. In this paper, we employ two heuristics based on information entropy and semantic analysis on well-known computer vision models and benchmark data from ImageNet. The heuristics are used to semi-automatically uncover potential higher-level issues in (i) the label taxonomy used to define the ground truth oracle (labels), and (ii) data encoding and representation. In doing this, beyond existing ML testing efforts, we illustrate the need for software engineering strategies that especially target and assess the oracle.\",\"PeriodicalId\":309659,\"journal\":{\"name\":\"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3387940.3391490\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3387940.3391490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

机器学习(ML)的普及，特别是深度学习的普及，既导致了对人工智能成就的乐观，也导致了对ML管道可能存在的弱点和漏洞的担忧。在软件工程社区中，这导致了ML测试技术的大量工作，包括ML模型的白盒和黑盒测试。这意味着需要解决oracle问题。对于有监督的ML应用程序，oracle信息确实以数据集“ground truth”的形式可用，它用相应的期望输出标签编码输入数据。然而，虽然事实形成了黄金标准，但仍不能保证它是真正正确的。实际上，oracle中的语法、语义和概念框架问题可能会对ML系统的完整性产生负面影响。虽然语法问题可以自动验证和纠正，但更高层次的问题传统上需要人工判断和人工分析。在本文中，我们采用基于信息熵和语义分析的两种启发式方法，对来自ImageNet的知名计算机视觉模型和基准数据进行分析。启发式用于半自动地发现以下方面的潜在高级问题:(i)用于定义基本真理oracle(标签)的标签分类法，以及(ii)数据编码和表示。在此过程中，除了现有的机器学习测试工作之外，我们还说明了对软件工程策略的需求，特别是针对和评估oracle。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Oracle Issues in Machine Learning and Where to Find Them

The rise in popularity of machine learning (ML), and deep learning in particular, has both led to optimism about achievements of artificial intelligence, as well as concerns about possible weaknesses and vulnerabilities of ML pipelines. Within the software engineering community, this has led to a considerable body of work on ML testing techniques, including white- and black-box testing for ML models. This means the oracle problem needs to be addressed. For supervised ML applications, oracle information is indeed available in the form of dataset 'ground truth', that encodes input data with corresponding desired output labels. However, while ground truth forms a gold standard, there still is no guarantee it is truly correct. Indeed, syntactic, semantic, and conceptual framing issues in the oracle may negatively affect the ML system's integrity. While syntactic issues may automatically be verified and corrected, the higher-level issues traditionally need human judgment and manual analysis. In this paper, we employ two heuristics based on information entropy and semantic analysis on well-known computer vision models and benchmark data from ImageNet. The heuristics are used to semi-automatically uncover potential higher-level issues in (i) the label taxonomy used to define the ground truth oracle (labels), and (ii) data encoding and representation. In doing this, beyond existing ML testing efforts, we illustrate the need for software engineering strategies that especially target and assess the oracle.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops

自引率

0.00%

发文量