Jiannan Wang, Thibaud Lutellier, Shangshu Qian, H. Pham, Lin Tan
{"title":"EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries","authors":"Jiannan Wang, Thibaud Lutellier, Shangshu Qian, H. Pham, Lin Tan","doi":"10.1145/3510003.3510165","DOIUrl":null,"url":null,"abstract":"Testing deep learning (DL) software is crucial and challenging. Recent approaches use differential testing to cross-check pairs of implementations of the same functionality across different libraries. Such approaches require two DL libraries implementing the same functionality, which is often unavailable. In addition, they rely on a high-level library, Keras, that implements missing functionality in all supported DL libraries, which is prohibitively expensive and thus no longer maintained. To address this issue, we propose EAGLE, a new technique that uses differential testing in a different dimension, by using equivalent graphs to test a single DL implementation (e.g., a single DL library). Equivalent graphs use different Application Programming Interfaces (APIs), data types, or optimizations to achieve the same functionality. The rationale is that two equivalent graphs executed on a single DL implementation should produce identical output given the same input. Specifically, we design 16 new DL equivalence rules and propose a technique, EAGLE, that (1) uses these equivalence rules to build concrete pairs of equivalent graphs and (2) cross-checks the output of these equivalent graphs to detect inconsistency bugs in a DL library. Our evaluation on two widely-used DL libraries, i.e., Tensor Flow and PyTorch, shows that EAGLE detects 25 bugs (18 in Tensor Flow and 7 in PyTorch), including 13 previously unknown bugs.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Testing deep learning (DL) software is crucial and challenging. Recent approaches use differential testing to cross-check pairs of implementations of the same functionality across different libraries. Such approaches require two DL libraries implementing the same functionality, which is often unavailable. In addition, they rely on a high-level library, Keras, that implements missing functionality in all supported DL libraries, which is prohibitively expensive and thus no longer maintained. To address this issue, we propose EAGLE, a new technique that uses differential testing in a different dimension, by using equivalent graphs to test a single DL implementation (e.g., a single DL library). Equivalent graphs use different Application Programming Interfaces (APIs), data types, or optimizations to achieve the same functionality. The rationale is that two equivalent graphs executed on a single DL implementation should produce identical output given the same input. Specifically, we design 16 new DL equivalence rules and propose a technique, EAGLE, that (1) uses these equivalence rules to build concrete pairs of equivalent graphs and (2) cross-checks the output of these equivalent graphs to detect inconsistency bugs in a DL library. Our evaluation on two widely-used DL libraries, i.e., Tensor Flow and PyTorch, shows that EAGLE detects 25 bugs (18 in Tensor Flow and 7 in PyTorch), including 13 previously unknown bugs.