EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) Pub Date : 2022-05-01 DOI:10.1145/3510003.3510165

Jiannan Wang, Thibaud Lutellier, Shangshu Qian, H. Pham, Lin Tan

{"title":"EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries","authors":"Jiannan Wang, Thibaud Lutellier, Shangshu Qian, H. Pham, Lin Tan","doi":"10.1145/3510003.3510165","DOIUrl":null,"url":null,"abstract":"Testing deep learning (DL) software is crucial and challenging. Recent approaches use differential testing to cross-check pairs of implementations of the same functionality across different libraries. Such approaches require two DL libraries implementing the same functionality, which is often unavailable. In addition, they rely on a high-level library, Keras, that implements missing functionality in all supported DL libraries, which is prohibitively expensive and thus no longer maintained. To address this issue, we propose EAGLE, a new technique that uses differential testing in a different dimension, by using equivalent graphs to test a single DL implementation (e.g., a single DL library). Equivalent graphs use different Application Programming Interfaces (APIs), data types, or optimizations to achieve the same functionality. The rationale is that two equivalent graphs executed on a single DL implementation should produce identical output given the same input. Specifically, we design 16 new DL equivalence rules and propose a technique, EAGLE, that (1) uses these equivalence rules to build concrete pairs of equivalent graphs and (2) cross-checks the output of these equivalent graphs to detect inconsistency bugs in a DL library. Our evaluation on two widely-used DL libraries, i.e., Tensor Flow and PyTorch, shows that EAGLE detects 25 bugs (18 in Tensor Flow and 7 in PyTorch), including 13 previously unknown bugs.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Testing deep learning (DL) software is crucial and challenging. Recent approaches use differential testing to cross-check pairs of implementations of the same functionality across different libraries. Such approaches require two DL libraries implementing the same functionality, which is often unavailable. In addition, they rely on a high-level library, Keras, that implements missing functionality in all supported DL libraries, which is prohibitively expensive and thus no longer maintained. To address this issue, we propose EAGLE, a new technique that uses differential testing in a different dimension, by using equivalent graphs to test a single DL implementation (e.g., a single DL library). Equivalent graphs use different Application Programming Interfaces (APIs), data types, or optimizations to achieve the same functionality. The rationale is that two equivalent graphs executed on a single DL implementation should produce identical output given the same input. Specifically, we design 16 new DL equivalence rules and propose a technique, EAGLE, that (1) uses these equivalence rules to build concrete pairs of equivalent graphs and (2) cross-checks the output of these equivalent graphs to detect inconsistency bugs in a DL library. Our evaluation on two widely-used DL libraries, i.e., Tensor Flow and PyTorch, shows that EAGLE detects 25 bugs (18 in Tensor Flow and 7 in PyTorch), including 13 previously unknown bugs.

查看原文本刊更多论文

EAGLE:创建等效图来测试深度学习库

测试深度学习(DL)软件是至关重要且具有挑战性的。最近的方法使用差异测试来交叉检查跨不同库的相同功能的实现对。这种方法需要两个实现相同功能的DL库，而这通常是不可用的。此外，它们依赖于一个高级库Keras，它实现了所有受支持的DL库中缺失的功能，这些功能非常昂贵，因此不再维护。为了解决这个问题，我们提出了EAGLE，这是一种在不同维度上使用差分测试的新技术，通过使用等效图来测试单个DL实现(例如，单个DL库)。等价图使用不同的应用程序编程接口、数据类型或优化来实现相同的功能。其基本原理是，在单个DL实现上执行的两个等效图应该在给定相同输入的情况下产生相同的输出。具体来说，我们设计了16个新的DL等价规则，并提出了一种技术EAGLE，该技术(1)使用这些等价规则构建具体的等价图对，(2)交叉检查这些等价图的输出以检测DL库中的不一致错误。我们对两个广泛使用的DL库，即Tensor Flow和PyTorch进行了评估，结果显示EAGLE检测到了25个bug (Tensor Flow中有18个，PyTorch中有7个)，其中包括13个以前未知的bug。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量