Matias Duran, Xiaoyi Zhang, Paolo Arcaini, F. Ishikawa
{"title":"What to Blame? On the Granularity of Fault Localization for Deep Neural Networks","authors":"Matias Duran, Xiaoyi Zhang, Paolo Arcaini, F. Ishikawa","doi":"10.1109/ISSRE52982.2021.00037","DOIUrl":"https://doi.org/10.1109/ISSRE52982.2021.00037","url":null,"abstract":"Validating Deep Neural Networks (DNNs) used for classification is of paramount importance; an approach for this consists in (i) executing the DNN over the test dataset, (ii) collecting information about classifications, and (iii) applying fault localization (FL) techniques to identify the neurons responsible for the misclassifications. DNNs can have multiple misclassification types, and so neurons responsible for one type could be different from those responsible for another type. However, depending on the granularity of the analyzed dataset, FL may not reveal these differences: failure types more frequent in the dataset may mask less frequent ones. We here propose a way to perform FL for DNNs that avoids this masking effect by selecting test data in a granular way. We conduct an empirical study, using a spectrum-based FL approach for DNNs, to assess how FL results change by changing the granularity of the analyzed test data. Namely, we perform FL by using test data with two different granularities: following a state-of-the-art approach that considers all misclassifications for a given class together, and the proposed fine-grained approach. Results show that FL should be done for each misclassification, such that practitioners have a more detailed analysis of the DNN faults and can make a more informed decision on what to fix in the DNN.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121732697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simplify Array Processing Loops for Efficient Program Verification","authors":"Xiang Du, Liangze Yin, Wei Dong","doi":"10.1109/ISSRE52982.2021.00049","DOIUrl":"https://doi.org/10.1109/ISSRE52982.2021.00049","url":null,"abstract":"Verification of large array programs is a major challenge for current program verification techniques due to large state spaces. Traditional methods such as bounded model checking often run out of time when verifying large array programs. To overcome the state explosion problem in the analysis of array programs, this paper proposes to simplify array processing loops for efficient program verification. For each array processing loop in an array program, we present a static analysis method to obtain a simplified loop for construction of a simplified program. The property checking of the simplified program can be used to approximate the original property. To evaluate the effectiveness and soundness, we implemented a tool based on our approach and tested it on SV-COMP 2019 benchmarks. The experimental results show that our method can successfully verify most program and achieve a high precision and effectiveness.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122592242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparative Study of Automatic Program Repair Techniques for Security Vulnerabilities","authors":"Eduard Pinconschi, Rui Abreu, P. Adão","doi":"10.1109/ISSRE52982.2021.00031","DOIUrl":"https://doi.org/10.1109/ISSRE52982.2021.00031","url":null,"abstract":"In the past years, research on automatic program repair (APR), in particular on test-suite-based approaches, has significantly attracted the attention of researchers. Despite the advances in the field, it remains unclear how these techniques fare in the context of security—most approaches are evaluated using benchmarks of bugs that do not (only) contain security vulnerabilities. In this paper, we present our observations using 10 state-of-the-art test-suite-based automatic program repair tools on the DARPA Cyber Grand Challenge benchmark of vulnerabilities in C/C++. Our intention is to have a better understanding of the current state of automatic program repair tools when addressing security issues. In particular, our study is guided by the hypothesis that the efficiency of repair tools may not generalize to security vulnerabilities. We found that the 10 analyzed tools can only fix 30 out of 55 vulnerable programs—54.6 % of the considered issues. In particular, we found that APR tools with atomic change operators and brute-force search strategy (AE and GenProg) and brute-force functionality deletion (Kali) overall perform better at repairing security vulnerabilities (considering both efficiency and effectiveness). AE is the tool that individually repairs most programs with 20 out of 55 programs (36.4%). The causes for failing to repair are discussed in the paper, which can help repair tool designers to improve their techniques and tools.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126232762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyu Sun, Xiao Chen, Kui Liu, Sheng Wen, Li Li, John C. Grundy
{"title":"Characterizing Sensor Leaks in Android Apps","authors":"Xiaoyu Sun, Xiao Chen, Kui Liu, Sheng Wen, Li Li, John C. Grundy","doi":"10.1109/ISSRE52982.2021.00058","DOIUrl":"https://doi.org/10.1109/ISSRE52982.2021.00058","url":null,"abstract":"While extremely valuable to achieve advanced functions, mobile phone sensors can be abused by attackers to implement malicious activities in Android apps, as experimentally demonstrated by many state-of-the-art studies. There is hence a strong need to regulate the usage of mobile sensors so as to keep them from being exploited by malicious attackers. However, despite the fact that various efforts have been put in achieving this, i.e., detecting privacy leaks in Android apps, we have not yet found approaches to automatically detect sensor leaks in Android apps. To fill the gap, we designed and implemented a novel prototype tool, Seeker, that extends the famous FlowDroid tool to detect sensor-based data leaks in Android apps. Seeker conducts sensor-focused static taint analyses directly on the Android apps' bytecode and reports not only sensor-triggered privacy leaks but also the sensor types involved in the leaks. Experimental results using over 40,000 real-world Android apps show that Seeker is effective in detecting sensor leaks in Android apps, and malicious apps are more interested in leaking sensor data than benign apps.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126324972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Black-Box Testing of Deep Neural Networks","authors":"Taejoon Byun, Sanjai Rayadurgam, M. Heimdahl","doi":"10.1109/ISSRE52982.2021.00041","DOIUrl":"https://doi.org/10.1109/ISSRE52982.2021.00041","url":null,"abstract":"Several test adequacy criteria have been developed for quantifying the the coverage of deep neural networks (DNNs) achieved by a test suite. Being dependent on the structure of the DNN, these can be costly to measure and use, especially given the highly iterative nature of the model training workflow. Further, testing provides higher overall assurance when such implementation dependent measures are used along with implementation independent ones. In this paper, we rigorously define a new black-box coverage criterion that is independent of the DNN model under test. We further describe a few desirable properties and associated evaluation metrics for assessing test coverage criteria and use those to empirically compare and contrast the black-box criterion with several DNN structural coverage criteria. Results indicate that the black-box criterion has comparable effectiveness and provides benefits that complement white-box criteria. The results also reveal a few weaknesses of coverage criteria for DNNs.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133545904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hengyu Zhao, S. Hari, Timothy Tsai, Michael B. Sullivan, S. Keckler, Jishen Zhao
{"title":"Suraksha: A Framework to Analyze the Safety Implications of Perception Design Choices in AVs","authors":"Hengyu Zhao, S. Hari, Timothy Tsai, Michael B. Sullivan, S. Keckler, Jishen Zhao","doi":"10.1109/ISSRE52982.2021.00052","DOIUrl":"https://doi.org/10.1109/ISSRE52982.2021.00052","url":null,"abstract":"Autonomous vehicles (AVs) employ sophisticated computer systems and algorithms to perceive the surroundings, localize, plan, and control the vehicle. With several available design choices for each of the system components, making design decisions without analyzing system-level safety consequences may compromise performance and safety. This paper proposes an automated AV safety evaluation framework called Suraksha to quantify and analyze the sensitivities of different design parameters on AV system safety on a set of driving situations. In this paper, we employ Suraksha to analyze the safety effects of modulating a set of perception parameters (perception being the most resource demanding AV tasks) on an industrial AV system. Results reveal that (a) the perception demands vary with driving scenario difficulty levels; (b) small per-frame inaccuracies and reduced camera processing rate can be traded off for power savings or diversity; (c) tested AV system tolerates up to 10% perception noise and delay even in harder driving scenarios. These results motivate future safety- and performance-aware system optimizations.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115097260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nondeterministic Impact of CPU Multithreading on Training Deep Learning Systems","authors":"Guanping Xiao, Jun Liu, Zheng Zheng, Yulei Sui","doi":"10.1109/ISSRE52982.2021.00063","DOIUrl":"https://doi.org/10.1109/ISSRE52982.2021.00063","url":null,"abstract":"With the wide deployment of deep learning (DL) systems, research in reliable and robust DL is not an option but a priority, especially for safety-critical applications. Unfortunately, DL systems are usually nondeterministic. Due to software-level (e.g., randomness) and hardware-level (e.g., GPUs or CPUs) factors, multiple training runs can generate inconsistent models and yield different evaluation results, even with identical settings and training data on the same implementation framework and hardware platform. Existing studies focus on analyzing software-level nondeterminism factors and the nondeterminism introduced by GPUs. However, the nondeterminism impact of CPU multi-threading on training DL systems has rarely been studied. To fill this knowledge gap, we present the first work of studying the variance and robustness of DL systems impacted by CPU multithreading. Our major contributions are fourfold: 1) An experimental framework based on VirtualBox for analyzing the impact of CPU multithreading on training DL systems; 2) Six findings obtained from our experiments and examination on GitHub DL projects; 3) Five implications to DL researchers and practitioners according to our findings; 4) Released the research data (https://github.com/DeterministicDeepLearning).","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126934578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-source Cross Project Defect Prediction with Joint Wasserstein Distance and Ensemble Learning","authors":"Quanyi Zou, Lu Lu, Zhanyu Yang, Hao Xu","doi":"10.1109/ISSRE52982.2021.00019","DOIUrl":"https://doi.org/10.1109/ISSRE52982.2021.00019","url":null,"abstract":"Cross-Project Defect Prediction (CPDP) refers to transferring knowledge from source software projects to a target software project. Previous research has shown that the impacts of knowledge transferred from different source projects differ on the target task. Therefore, one of the fundamental challenges in CPDP is how to measure the amount of knowledge transferred from each source project to the target task. This article proposed a novel CPDP method called Multi-source defect prediction with Joint Wasserstein Distance and Ensemble Learning (MJWDEL) to learn transferred weights for evaluating the importance of each source project to the target task. In particular, first of all, applying the TCA technique and Logistic Regression (LR) train a sub-model for each source project and the target project. Moreover, the article designs joint Wassertein distance to understand the source-target relationship and then uses this as a basis to compute the transferred weights of different sub-models. After that, the transferred weights can be used to reweight these sub-models to determine their importance in knowledge transfer to the target task. We conducted experiments on 19 software projects from PROMISE, NASA and AEEEM datasets. Compared with several state-of-the-art CPDP methods, the proposed method substantially improves CPDP performance in terms of four evaluation indicators (i.e., F-measure, Balance, G-measure and MMC).","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122475190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}