E. Soremekun, Lukas Kirschner, Marcel Böhme, Mike Papadakis
{"title":"Evaluating the Impact of Experimental Assumptions in Automated Fault Localization","authors":"E. Soremekun, Lukas Kirschner, Marcel Böhme, Mike Papadakis","doi":"10.1109/ICSE48619.2023.00025","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00025","url":null,"abstract":"Much research on automated program debugging often assumes that bug fix location(s) indicate the faults' root causes and that root causes of faults lie within single code elements (statements). It is also often assumed that the number of statements a developer would need to inspect before finding the first faulty statement reflects debugging effort. Although intuitive, these three assumptions are typically used (55% of experiments in surveyed publications make at least one of these three assumptions) without any consideration of their effects on the debugger's effectiveness and potential impact on developers in practice. To deal with this issue, we perform controlled experimentation, split testing in particular, using 352 bugs from 46 open-source C programs, 19 Automated Fault Localization (AFL) techniques (18 statistical debugging formulas and dynamic slicing), two (2) state-of-the-art automated program repair (APR) techniques (GenProg and Angelix) and 76 professional developers. Our results show that these assumptions conceal the difficulty of debugging. They make AFL techniques appear to be (up to 38%) more effective, and make APR tools appear to be (2X) less effective. We also find that most developers (83%) consider these assumptions to be unsuitable for debuggers and, perhaps worse, that they may inhibit development productivity. The majority (66%) of developers prefer debugging diagnoses without these assumptions twice as much as with the assumptions. Our findings motivate the need to assess debuggers conservatively, i.e., without these assumptions.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117132436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengyi Pan, Lingfeng Bao, Xin Xia, David Lo, Shanping Li
{"title":"Fine-grained Commit-level Vulnerability Type Prediction by CWE Tree Structure","authors":"Shengyi Pan, Lingfeng Bao, Xin Xia, David Lo, Shanping Li","doi":"10.1109/ICSE48619.2023.00088","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00088","url":null,"abstract":"Identifying security patches via code commits to allow early warnings and timely fixes for Open Source Software (OSS) has received increasing attention. However, the existing detection methods can only identify the presence of a patch (i.e., a binary classification) but fail to pinpoint the vulnerability type. In this work, we take the first step to categorize the security patches into fine-grained vulnerability types. Specifically, we use the Common Weakness Enumeration (CWE) as the label and perform fine-grained classification using categories at the third level of the CWE tree. We first formulate the task as a Hierarchical Multi-label Classification (HMC) problem, i.e., inferring a path (a sequence of CWE nodes) from the root of the CWE tree to the node at the target depth. We then propose an approach named TreeVul with a hierarchical and chained architecture, which manages to utilize the structure information of the CWE tree as prior knowledge of the classification task. We further propose a tree structure aware and beam search based inference algorithm for retrieving the optimal path with the highest merged probability. We collect a large security patch dataset from NVD, consisting of 6,541 commits from 1,560 GitHub OSS repositories. Experimental results show that Tree-vulsignificantly outperforms the best performing baselines, with improvements of 5.9%, 25.0%, and 7.7% in terms of weighted F1-score, macro F1-score, and MCC, respectively. We further conduct a user study and a case study to verify the practical value of TreeVul in enriching the binary patch detection results and improving the data quality of NVD, respectively.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121673991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated Summarization of Stack Overflow Posts","authors":"Bonan Kou, Muhao Chen, Tianyi Zhang","doi":"10.1109/ICSE48619.2023.00158","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00158","url":null,"abstract":"Software developers often resort to Stack Overflow (SO) to fill their programming needs. Given the abundance of relevant posts, navigating them and comparing different solutions is tedious and time-consuming. Recent work has proposed to automatically summarize SO posts to concise text to facilitate the navigation of SO posts. However, these techniques rely only on information retrieval methods or heuristics for text summarization, which is insufficient to handle the ambiguity and sophistication of natural language. This paper presents a deep learning based framework called Assortfor SO post summarization. Assortincludes two complementary learning methods, $mathbf{Assort}_{S}$ and $mathbf{Assort}_{IS}$, to address the lack of labeled training data for SO post summarization. $mathbf{Assort}_{S}$ is designed to directly train a novel ensemble learning model with BERT embeddings and domain-specific features to account for the unique characteristics of SO posts. By contrast, $mathbf{Assort}_{IS}$ is designed to reuse pre-trained models while addressing the domain shift challenge when no training data is present (i.e., zero-shot learning). Both $mathbf{Assort}_{S}$ and $mathbf{Assort}_{IS}$ outperform six existing techniques by at least 13% and 7% respectively in terms of the F1 score. Furthermore, a human study shows that participants significantly preferred summaries generated by $mathbf{Assort}_{S}$ and $mathbf{Assort}_{IS}$ over the best baseline, while the preference difference between $mathbf{Assort}_{S}$ and $mathbf{Assort}_{IS}$ was small.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114836950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JITfuzz: Coverage-guided Fuzzing for JVM Just-in-Time Compilers","authors":"Mingyuan Wu, Minghai Lu, Heming Cui, Junjie Chen, Yuqun Zhang, Lingming Zhang","doi":"10.1109/ICSE48619.2023.00017","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00017","url":null,"abstract":"As a widely-used platform to support various Java-bytecode-based applications, Java Virtual Machine (JVM) incurs severe performance loss caused by its real-time program interpretation mechanism. To tackle this issue, the Just-in- Time compiler (JIT) has been widely adopted to strengthen the efficacy of JVM. Therefore, how to effectively and efficiently detect JIT bugs becomes critical to ensure the correctness of JVM. In this paper, we propose a coverage-guided fuzzing framework, namely JITfuzz, to automatically detect JIT bugs. In particular, JITfuzz adopts a set of optimization-activating mutators to trigger the usage of typical JIT optimizations, e.g., function inlining and simplification. Meanwhile, given JIT optimizations are closely coupled with program control flows, JITfuzz also adopts mutators to enrich the control flows of target programs. Moreover, JITfuzz also proposes a mutator scheduler which iteratively schedules mutators according to the coverage updates to maximize the code coverage of JIT. To evaluate the effectiveness of JITfuzz, we conduct a set of experiments based on a benchmark suite with 16 popular JVM-based projects from GitHub. The experimental results suggest that JITfuzz outperforms the state-of-the-art mutation-based and generation-based JVM fuzzers by 27.9 % and 18.6 % respectively in terms of edge coverage on average. Furthermore, JITfuzz also successfully detects 36 previously unknown bugs (including 23 JIT bugs) and 27 bugs (including 18 JIT bugs) have been confirmed by the developers.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"246 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121879111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MirrorTaint: Practical Non-intrusive Dynamic Taint Tracking for JVM-based Microservice Systems","authors":"Yicheng Ouyang, Kailai Shao, Kunqiu Chen, Ruobing Shen, Chao Chen, Mingze Xu, Yuqun Zhang, Lingming Zhang","doi":"10.1109/ICSE48619.2023.00210","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00210","url":null,"abstract":"Taint analysis, i.e., labeling data and propagating the labels through data flows, has been widely used for analyzing program information flows and ensuring system/data security. Due to its important applications, various taint analysis techniques have been proposed, including static and dynamic taint analysis. However, existing taint analysis techniques can be hardly applied to the rising microservice systems for industrial applications. To address such a problem, in this paper, we proposed the first practical non-intrusive dynamic taint analysis technique MirrorTaint for extensively supporting microservice systems on JVMs. In particular, by instrumenting the microservice systems, MirrorTaint constructs a set of data structures with their respective policies for labeling/propagating taints in its mirrored space. Such data structures are essentially non-intrusive, i.e., modifying no program meta-data or runtime system. Then, during program execution, MirrorTaint replicates the stack-based JVM instruction execution in its mirrored space on-the-fly for dynamic taint tracking. We have evaluated MirrorTaint against state-of-the-art dynamic and static taint analysis systems on various popular open-source microservice systems. The results demonstrate that MirrorTaint can achieve better compatibility, quite close precision and higher recall (97.9%/100.0%) than state-of-the-art Phosphor (100.0%/9.9%) and FlowDroid (100%/28.2%). Also, MirrorTaint incurs lower runtime overhead than Phosphor (although both are dynamic techniques). Moreover, we have performed a case study in Ant Group, a global billion-user FinTech company, to compare MirrorTaint and their mature developer-experience-based data checking system for automatically generated fund documents. The result shows that the developer experience can be incomplete, causing the data checking system to only cover 84.0% total data relations, while MirrorTaint can automatically find 99.0% relations with 100.0% precision. Lastly, we also applied MirrorTaint to successfully detect a recently wide-spread Log4j2 security vulnerability.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131049006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Future Software for Life in Trusted Futures","authors":"S. Pink","doi":"10.1109/icse48619.2023.00010","DOIUrl":"https://doi.org/10.1109/icse48619.2023.00010","url":null,"abstract":"How will people, other species, software and hardware live together in as yet unknown futures? How can we work towards trusted and safe futures where human values and the environment are supported by emerging technologies? Research demonstrates that human values and everyday life priorities, ethics, routines and activities will shape our possible futures. I will draw on ethnographic research to outline how people anticipate and imagine everyday life futures with emerging technologies in their homes and neighbourhoods, and how technology workers envisage futures in their professional lives. If, as social science research shows, technologies cannot solve human and societal problems, what roles should they play in future life? What are the implications for future software? What values should underpin its design? Where should it be developed? By and in collaboration with whom? What role can software play in generating the circumstances for trusted futures?","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129603082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Guan, Ying Xiao, Jiaying Li, Yepang Liu, Guangdong Bai
{"title":"A Comprehensive Study of Real-World Bugs in Machine Learning Model Optimization","authors":"Hao Guan, Ying Xiao, Jiaying Li, Yepang Liu, Guangdong Bai","doi":"10.1109/ICSE48619.2023.00024","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00024","url":null,"abstract":"Due to the great advance in machine learning (ML) techniques, numerous ML models are expanding their application domains in recent years. To adapt for resource-constrained platforms such as mobile and Internet of Things (IoT) devices, pre-trained models are often processed to enhance their efficiency and compactness, using optimization techniques such as pruning and quantization. Similar to the optimization process in other complex systems, e.g., program compilers and databases, optimizations for ML models can contain bugs, leading to severe consequences such as system crashes and financial loss. While bugs in training, compiling and deployment stages have been extensively studied, there is still a lack of systematic understanding and characterization of model optimization bugs (MOBs). In this work, we conduct the first empirical study to identify and characterize MOBs. We collect a comprehensive dataset containing 371 MOBs from TensorFlow and PyTorch, the most extensively used open-source ML frameworks, covering the entire development time span of their optimizers (May 2019 to August 2022). We then investigate the collected bugs from various perspectives, including their symptoms, root causes, life cycles, detection and fixes. Our work unveils the status quo of MOBs in the wild, and reveals their features on which future detection techniques can be based. Our findings also serve as a warning to the developers and the users of ML frameworks, and an appeal to our research community to enact dedicated countermeasures.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121912186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aashish Yadavally, T. Nguyen, Wenbo Wang, Shaohua Wang
{"title":"(Partial) Program Dependence Learning","authors":"Aashish Yadavally, T. Nguyen, Wenbo Wang, Shaohua Wang","doi":"10.1109/ICSE48619.2023.00209","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00209","url":null,"abstract":"Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%-97.17% and 92.46%-96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG*) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG* is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 real-world vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127983829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sen Yang, Sen Chen, Lingling Fan, Sihan Xu, Zhan-wei Hui, Song Huang
{"title":"Compatibility Issue Detection for Android Apps Based on Path-Sensitive Semantic Analysis","authors":"Sen Yang, Sen Chen, Lingling Fan, Sihan Xu, Zhan-wei Hui, Song Huang","doi":"10.1109/ICSE48619.2023.00033","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00033","url":null,"abstract":"Android API-related compatibility issues have be-come a severe problem and significant challenge for app devel-opers due to the well-known Android fragmentation issues. To address this problem, many effective approaches such as app-based and API lifetime-based methods have been proposed to identify incompatible API usages. However, due to the various implementations of API usages and different API invoking paths, there is still a significant weakness of existing approaches, i.e., introducing a massive number of false positives (FP) and false negatives (FN). To this end, in this paper, we propose PSDroid, an automated compatibility detection approach for Android apps, which aims to reduce FPs and FNs by overcoming several technical bottlenecks. Firstly, we make substantial efforts to carry out a preliminary study to summarize a set of novel API usages with diverse checking implementations. Secondly, we construct a refined API lifetime database by leveraging a semantic resolving analysis on all existing Android SDK frameworks. Based on the above two key phases, we design and implement a novel path-sensitive semantic approach to effectively and automatically detect incompatibility issues. To demonstrate the performance, we compared with five existing approaches (i.e., FicFinder, ACRYL, CIDER, IctAPIFinder, and CID) and the results show that PSDroid outperforms existing tools. We also conducted an in-depth root cause analysis to comprehensively explain the ability of PSDroid in reducing FPs and FNs. Finally, 18/30 reported issues have been confirmed and further fixed by app developers.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114459483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gal Amram, Dor Ma'ayan, S. Maoz, Or Pistiner, Jan Oliver Ringert
{"title":"Triggers for Reactive Synthesis Specifications","authors":"Gal Amram, Dor Ma'ayan, S. Maoz, Or Pistiner, Jan Oliver Ringert","doi":"10.1109/ICSE48619.2023.00070","DOIUrl":"https://doi.org/10.1109/ICSE48619.2023.00070","url":null,"abstract":"Reactive synthesis is an automated procedure to obtain a correct-by-construction reactive system from its temporal logic specification. Two of the main challenges in bringing reactive synthesis to practice are its very high worst-case complexity and the difficulty of writing declarative specifications using basic LTL operators. To address the first challenge, researchers have suggested the GR(1) fragment of LTL, which has an efficient poly-nomial time symbolic synthesis algorithm. To address the second challenge, specification languages include higher-level constructs that aim at allowing engineers to write succinct and readable specifications. One such construct is the triggers operator, as supported, e.g., in the Property Specification Language (PSL). In this work we introduce triggers into specifications for reactive synthesis. The effectiveness of our contribution relies on a novel encoding of regular expressions using symbolic finite automata (SFA) and on a novel semantics for triggers that, in contrast to PSL triggers, admits an efficient translation into GR(1). We show that our triggers are expressive and succinct, and prove that our encoding is optimal. We have implemented our ideas on top of the Spectra language and synthesizer. We demonstrate the usefulness and effectiveness of using triggers in specifications for synthesis, as well as the challenges involved in using them, via a study of more than 300 triggers written by undergraduate students who participated in a project class on writing specifications for synthesis. To the best of our knowledge, our work is the first to introduce triggers into specifications for reactive synthesis.","PeriodicalId":376379,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129579594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}