{"title":"Adaptive Test Selection for Deep Neural Networks","authors":"Xinyu Gao, Yang Feng, Yining Yin, Zixi Liu, Zhenyu Chen, Baowen Xu","doi":"10.1145/3510003.3510232","DOIUrl":"https://doi.org/10.1145/3510003.3510232","url":null,"abstract":"Deep neural networks (DNN) have achieved tremendous development in the past decade. While many DNN-driven software applications have been deployed to solve various tasks, they could also produce incorrect behaviors and result in massive losses. To reveal the incorrect behaviors and improve the quality of DNN-driven applications, developers often need rich labeled data for the testing and optimization of DNN models. However, in practice, collecting diverse data from application scenarios and labeling them properly is often a highly expensive and time-consuming task. In this paper, we proposed an adaptive test selection method, namely ATS, for deep neural networks to alleviate this problem. ATS leverages the difference between the model outputs to measure the behavior diversity of DNN test data. And it aims at selecting a subset with diverse tests from a massive unlabelled dataset. We experiment ATS with four well-designed DNN models and four widely-used datasets in comparison with various kinds of neuron coverage (NC). The results demonstrate that ATS can significantly outperform all test selection methods in assessing both fault detection and model improvement capability of test suites. It is promising to save the data labeling and model retraining costs for deep neural networks.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127167857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ameya Ketkar, O. Smirnov, Nikolaos Tsantalis, Danny Dig, T. Bryksin
{"title":"Inferring and Applying Type Changes","authors":"Ameya Ketkar, O. Smirnov, Nikolaos Tsantalis, Danny Dig, T. Bryksin","doi":"10.1145/3510003.3510115","DOIUrl":"https://doi.org/10.1145/3510003.3510115","url":null,"abstract":"Developers frequently change the type of a program element and update all its references to increase performance, security, or maintainability. Manually performing type changes is tedious, error-prone, and it overwhelms developers. Researchers and tool builders have proposed advanced techniques to assist developers when performing type changes. A major obstacle in using these techniques is that the developer has to manually encode rules for defining the type changes. Handcrafting such rules is difficult and often involves multiple trial-error iterations. Given that open-source repositories contain many examples of type-changes, if we could infer the adaptations, we would eliminate the burden on developers. We introduce TC-Infer, a novel technique that infers rewrite rules that capture the required adaptations from the version histories of open source projects. We then use these rules (expressed in the Comby language) as input to existing type change tools. To evaluate the effectiveness of TC-Infer, we use it to infer 4,931 rules for 605 popular type changes in a corpus of 400K commits. Our results show that TC-Infer deduced rewrite rules for 93% of the most popular type change patterns. Our results also show that the rewrite rules produced by TC-Infer are highly effective at applying type changes (99.2% precision and 93.4% recall). To advance the existing tooling we released IntelliTC, an interactive and configurable refactoring plugin for IntelliJ IDEA to perform type changes.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116017759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuofei Zhu, Ziyi Zhang, Boqin Qin, Aiping Xiong, Linhai Song
{"title":"Learning and Programming Challenges of Rust: A Mixed-Methods Study","authors":"Shuofei Zhu, Ziyi Zhang, Boqin Qin, Aiping Xiong, Linhai Song","doi":"10.1145/3510003.3510164","DOIUrl":"https://doi.org/10.1145/3510003.3510164","url":null,"abstract":"Rust is a young systems programming language designed to provide both the safety guarantees of high-level languages and the execution performance of low-level languages. To achieve this design goal, Rust provides a suite of safety rules and checks against those rules at the compile time to eliminate many memory-safety and thread-safety issues. Due to its safety and performance, Rust's popularity has increased significantly in recent years, and it has already been adopted to build many safety-critical software systems. It is critical to understand the learning and programming challenges imposed by Rust's safety rules. For this purpose, we first conducted an empirical study through close, manual inspection of 100 Rust-related Stack Overflow questions. We sought to understand (1) what safety rules are challenging to learn and program with, (2) under which contexts a safety rule becomes more difficult to apply, and (3) whether the Rust compiler is sufficiently helpful in debugging safety-rule violations. We then performed an online survey with 101 Rust programmers to validate the findings of the empirical study. We invited participants to evaluate program variants that differ from each other, either in terms of violated safety rules or the code constructs involved in the violation, and compared the participants' performance on the variants. Our mixed-methods investigation revealed a range of consistent findings that can benefit Rust learners, practitioners, and language designers.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122923945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Fazzini, Chase Choi, Juan Manuel Copia, Gabriel Lee, Yoshiki Kakehi, Alessandra Gorla, A. Orso
{"title":"Use of Test Doubles in Android Testing: An In-Depth Investigation","authors":"M. Fazzini, Chase Choi, Juan Manuel Copia, Gabriel Lee, Yoshiki Kakehi, Alessandra Gorla, A. Orso","doi":"10.1145/3510003.3510175","DOIUrl":"https://doi.org/10.1145/3510003.3510175","url":null,"abstract":"Android apps interact with their environment extensively, which can result in flaky, slow, or hard-to-debug tests. Developers often address these problems using test doubles—developer-defined objects that replace app or library classes during test execution. Although test doubles are widely used, there is limited understanding of how they are used in practice. To bridge this gap, we present an in-depth empirical study that aims to shed light on how developers create and use test doubles in Android apps. In our study, we first analyze a dataset of 1,006 apps with publicly available test suites to identify which frameworks and approaches developers most commonly use to create test doubles. We then investigate several research questions by studying how test doubles defined using these popular frameworks are created and used in the ten apps in the dataset that define the highest number of test doubles using these frameworks. Our results, based on the analysis of 2,365 test doubles that replace a total of 784 classes, provide insight into the types of test doubles used within Android apps and how they are utilized. Our results also show that test doubles used in Android apps and traditional Java test doubles differ in at least some respect. Finally, our results show that test doubles can introduce test smells and even mistakes in the test code. In the paper, we also discuss some implications of our findings that can help researchers and practitioners working in this area and guide future research.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128208915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shao Yang, Yuehan Wang, Y. Yao, Haoyu Wang, Yanfang Ye, Xusheng Xiao
{"title":"DescribeCtx: Context-Aware Description Synthesis for Sensitive Behaviors in Mobile Apps","authors":"Shao Yang, Yuehan Wang, Y. Yao, Haoyu Wang, Yanfang Ye, Xusheng Xiao","doi":"10.1145/3510003.3510058","DOIUrl":"https://doi.org/10.1145/3510003.3510058","url":null,"abstract":"While mobile applications (i.e., apps) are becoming capable of handling various needs from users, their increasing access to sensitive data raises privacy concerns. To inform such sensitive behaviors to users, existing techniques propose to automatically identify explanatory sentences from app descriptions; however, many sensitive behaviors are not explained in the corresponding app descriptions. There also exist general techniques that translate code to sentences. However, these techniques lack the vocabulary to explain the uses of sensitive data and fail to consider the context (i.e., the app functionalities) of the sensitive behaviors. To address these limitations, we propose Describectx, a context-aware description synthesis approach that trains a neural machine translation model using a large set of popular apps, and generates app-specific descriptions for sensitive behaviors. Specifically, Describectx encodes three heterogeneous sources as input, i.e., vocabularies provided by privacy policies, behavior summary provided by the call graphs in code, and contextual information provided by GUI texts. Our evaluations on 1,262 Android apps show that, compared with existing baselines, Describectx produces more accurate descriptions (24.96 in BLEU) and achieves higher user ratings with respect to the reference sen-tences manually identified in the app descriptions.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127013415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingyuan Wu, Lingixao Jiang, Jiahong Xiang, Yuqun Zhang, Guowei Yang, Huixin Ma, Sen Nie, Shi Wu, Heming Cui, Lingming Zhang
{"title":"Evaluating and Improving Neural Program-Smoothing-based Fuzzing","authors":"Mingyuan Wu, Lingixao Jiang, Jiahong Xiang, Yuqun Zhang, Guowei Yang, Huixin Ma, Sen Nie, Shi Wu, Heming Cui, Lingming Zhang","doi":"10.1145/3510003.3510089","DOIUrl":"https://doi.org/10.1145/3510003.3510089","url":null,"abstract":"Fuzzing nowadays has been commonly modeled as an optimization problem, e.g., maximizing code coverage under a given time budget via typical search-based solutions such as evolutionary algorithms. However, such solutions are widely argued to cause inefficient computing resource usage, i.e., inefficient mutations. To address this issue, two neural program-smoothing-based fuzzers, Neuzz and MTFuzz, have been recently proposed to approximate program branching behaviors via neural network models, which input byte sequences of a seed and output vectors representing program branching behaviors. Moreover, assuming that mutating the bytes with larger gradients can better explore branching behaviors, they develop strategies to mutate such bytes for generating new seeds as test cases. Meanwhile, although they have been shown to be effective in the original papers, they were only evaluated upon a limited dataset. In addition, it is still unclear how their key technical components and whether other factors can impact fuzzing performance. To further investigate neural program-smoothing-based fuzzing, we first construct a large-scale benchmark suite with a total of 28 popular open-source projects. Then, we extensively evaluate Neuzz and MTFuzz on such benchmarks. The evaluation results suggest that their edge coverage performance can be unstable. Moreover, neither neural network models nor mutation strategies can be consistently effective, and the power of their gradient-guidance mechanisms have been compromised. Inspired by such findings, we propose a simplistic technique, PreFuzz, which improves neural program-smoothing-based fuzzers with a resource-efficient edge selection mechanism to enhance their gradient guidance and a probabilistic byte selection mechanism to further boost mutation effectiveness. Our evaluation results indicate that PreFuzz can significantly increase the edge coverage of Neuzz/MTFuzz, and also reveal multiple practical guidelines to advance future research on neural program-smoothing-based fuzzing.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121188857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongjie Ye, Wei Chen, Wensheng Dou, Guoquan Wu, Jun Wei
{"title":"Knowledge-Based Environment Dependency Inference for Python Programs","authors":"Hongjie Ye, Wei Chen, Wensheng Dou, Guoquan Wu, Jun Wei","doi":"10.1145/3510003.3510127","DOIUrl":"https://doi.org/10.1145/3510003.3510127","url":null,"abstract":"Besides third-party packages, the Python interpreter and system libraries are also critical dependencies of a Python program. In our empirical study, 34% programs are only compatible with specific Python interpreter versions, and 24% programs require specific system libraries. However, existing techniques mainly focus on inferring third-party package dependencies. Therefore, they can lack other necessary dependencies and violate version constraints, thus resulting in program build failures and runtime errors. This paper proposes a knowledge-based technique named PyEGo, which can automatically infer dependencies of third-party packages, the Python interpreter, and system libraries at compatible versions for Python programs. We first construct the dependency knowl-edge graph PyKG, which can portray the relations and constraints among third-party packages, the Python interpreter, and system libraries. Then, by querying PyKG with extracted program features, PyEGo constructs a program-related sub-graph with dependency candidates of the three types. It finally outputs the latest compatible dependency versions by solving constraints in the sub-graph. We evaluate PyEGo on 2,891 single-file Python gists, 100 open-source Python projects and 4,836 jupyter notebooks. The experimental re-sults show that PyEGo achieves better accuracy, 0.2x to 3.5x higher than the state-of-the-art approaches.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114320437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Imperative versus Declarative Collection Processing: An RCT on the Understandability of Traditional Loops versus the Stream API in Java","authors":"N. Mehlhorn, Stefan Hanenberg","doi":"10.1145/3510003.3519016","DOIUrl":"https://doi.org/10.1145/3510003.3519016","url":null,"abstract":"Java introduced in version 8 with the Stream API means to operate on collections using lambda expressions. Since then, this API is an alternative way to handle collections in a more declarative manner instead of the traditional, imperative style using loops. However, whether the Stream API is beneficial in comparison to loops in terms of usability is unclear. The present paper introduces a randomized control trial (RCT) on the understandability of collection operations performed on 20 participants with the dependent variables response time and correctness. As tasks, subjects had to determine the results for collection operations (either defined with the Stream API or with loops). The results indicate that the Stream API has a significant $(mathrm{p} <. 001)$ and large $(eta_{p}^{2}=.695;frac{M_{loop}}{M_{stream}} sim 178%)$ positive effect on the response times. Furthermore, the usage of the Stream API caused significantly less errors. And finally, the participants perceived their speed with the Stream API higher compared to the loop-based code and the participants considered the code based on the Stream API as more readable. Hence, while existing studies found a negative effect of declarative constructs (in terms of lambda expressions) on the usability of a main stream programming language, the present study found the opposite: the present study gives evidence that declarative code on collections using the Stream API based on lambda expressions has a large, positive effect in comparison to traditional loops.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125694690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Exploratory Study of Deep learning Supply Chain","authors":"Xin Tan, Kai Gao, Minghui Zhou, Li Zhang","doi":"10.1145/3510003.3510199","DOIUrl":"https://doi.org/10.1145/3510003.3510199","url":null,"abstract":"Deep learning becomes the driving force behind many contemporary technologies and has been successfully applied in many fields. Through software dependencies, a multi-layer supply chain (SC) with a deep learning framework as the core and substantial down-stream projects as the periphery has gradually formed and is constantly developing. However, basic knowledge about the structure and characteristics of the SC is lacking, which hinders effective support for its sustainable development. Previous studies on software SC usually focus on the packages in different registries without paying attention to the SCs derived from a single project. We present an empirical study on two deep learning SCs: TensorFlow and PyTorch SCs. By constructing and analyzing their SCs, we aim to understand their structure, application domains, and evolutionary factors. We find that both SCs exhibit a short and sparse hierarchy structure. Overall, the relative growth of new projects increases month by month. Projects have a tendency to attract downstream projects shortly after the release of their packages, later the growth becomes faster and tends to stabilize. We propose three criteria to identify vulnerabilities and identify 51 types of packages and 26 types of projects involved in the two SCs. A comparison reveals their similarities and differences, e.g., TensorFlow SC provides a wealth of packages in experiment result analysis, while PyTorch SC contains more specific framework packages. By fitting the GAM model, we find that the number of dependent packages is significantly negatively associated with the number of downstream projects, but the relationship with the number of authors is nonlinear. Our findings can help further open the “black box” of deep learning SCs and provide insights for their healthy and sustainable development.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127901019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ellen Arteca, Sebastian Harner, Michael Pradel, F. Tip
{"title":"Nessie: Automatically Testing JavaScript APIs with Asynchronous Callbacks","authors":"Ellen Arteca, Sebastian Harner, Michael Pradel, F. Tip","doi":"10.1145/3510003.3510106","DOIUrl":"https://doi.org/10.1145/3510003.3510106","url":null,"abstract":"Previous algorithms for feedback-directed unit test generation iteratively create sequences of API calls by executing partial tests and by adding new API calls at the end of the test. These algorithms are challenged by a popular class of APIs: higher-order functions that receive callback arguments, which often are invoked asyn-chronously. Existing test generators cannot effectively test such APIs because they only sequence API calls, but do not nest one call into the callback function of another. This paper presents Nessie, the first feedback-directed unit test generator that supports nesting of API calls and that tests asynchronous callbacks. Nesting API calls enables a test to use values produced by an API that are available only once a callback has been invoked, and is often necessary to ensure that methods are invoked in a specific order. The core contributions of our approach are a tree-based representation of unit tests with callbacks and a novel algorithm to iteratively generate such tests in a feedback-directed manner. We evaluate our approach on ten popular JavaScript libraries with both asynchronous and synchronous callbacks. The results show that, in a comparison with LambdaTester, a state of the art test generation technique that only considers sequencing of method calls, Nessie finds more behavioral differences and achieves slightly higher coverage. Notably, Nessie needs to generate significantly fewer tests to achieve and exceed the coverage achieved by the state of the art.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127929002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}