ACM Transactions on Software Engineering and Methodology最新文献_第10页

Test Generation Strategies for Building Failure Models and Explaining Spurious Failures 建立故障模型和解释假故障的测试生成策略

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-21 DOI: 10.1145/3638246

Baharin A. Jodat, Abhishek Chandar, Shiva Nejati, Mehrdad Sabetzadeh

{"title":"Test Generation Strategies for Building Failure Models and Explaining Spurious Failures","authors":"Baharin A. Jodat, Abhishek Chandar, Shiva Nejati, Mehrdad Sabetzadeh","doi":"10.1145/3638246","DOIUrl":"https://doi.org/10.1145/3638246","url":null,"abstract":"Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic. Failures resulting from invalid or unrealistic test inputs are spurious. Avoiding spurious failures improves the effectiveness of testing in exercising the main functions of a system, particularly for compute-intensive (CI) systems where a single test execution takes significant time. In this paper, we propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures. We examine two alternative strategies for building failure models: (1) machine learning (ML)-guided test generation and (2) surrogate-assisted test generation. ML-guided test generation infers boundary regions that separate passing and failing test inputs and samples test inputs from those regions. Surrogate-assisted test generation relies on surrogate models to predict labels for test inputs instead of exercising all the inputs. We propose a novel surrogate-assisted algorithm that uses multiple surrogate models simultaneously, and dynamically selects the prediction from the most accurate model. We empirically evaluate the accuracy of failure models inferred based on surrogate-assisted and ML-guided test generation algorithms. Using case studies from the domains of cyber-physical systems and networks, we show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%, significantly outperforming ML-guided test generation and two baselines. Further, our approach learns failure-inducing rules that identify genuine spurious failures as validated against domain knowledge.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"81 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139028039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deceiving Humans and Machines Alike: Search-based Test Input Generation for DNNs using Variational Autoencoders 欺骗人类和机器：使用变异自动编码器为 DNN 生成基于搜索的测试输入

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-21 DOI: 10.1145/3635706

Sungmin Kang, Robert Feldt, Shin Yoo

{"title":"Deceiving Humans and Machines Alike: Search-based Test Input Generation for DNNs using Variational Autoencoders","authors":"Sungmin Kang, Robert Feldt, Shin Yoo","doi":"10.1145/3635706","DOIUrl":"https://doi.org/10.1145/3635706","url":null,"abstract":"Due to the rapid adoption of Deep Neural Networks (DNNs) into larger software systems, testing of DNN based systems has received much attention recently. While many different test adequacy criteria have been suggested, we lack effective test input generation techniques. Inputs such as images of real world objects and scenes are not only expensive to collect but also difficult to randomly sample. Consequently, current testing techniques for DNNs tend to apply small local perturbations to existing inputs to generate new inputs. We propose SINVAD, a way to sample from, and navigate over, a space of realistic inputs that resembles the true distribution in the training data. Our input space is constructed using Variational AutoEncoders (VAEs), and navigated through their latent vector space. Our analysis shows that the VAE-based input space is well-aligned with human perception of what constitutes realistic inputs. Further, we show that this space can be effectively searched to achieve various testing scenarios, such as boundary testing of two different DNNs or analyzing class labels that are difficult for the given DNN to distinguish. Guidelines on how to design VAE architectures are presented as well. Our results have the potential to open the field to meaningful exploration through the space of highly structured images.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"1 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138823838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects 超越准确性：开源深度学习项目中的单元测试实证研究

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-20 DOI: 10.1145/3638245

Han Wang, Sijia Yu, Chunyang Chen, Burak Turhan, Xiaodong Zhu

{"title":"Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects","authors":"Han Wang, Sijia Yu, Chunyang Chen, Burak Turhan, Xiaodong Zhu","doi":"10.1145/3638245","DOIUrl":"https://doi.org/10.1145/3638245","url":null,"abstract":"Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"1 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138819666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PACE: A Program Analysis Framework for Continuous Performance Prediction PACE：用于持续性能预测的程序分析框架

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-14 DOI: 10.1145/3637230

Chidera Biringa, Gökhan Kul

引用次数: 0

Measuring and Clustering Heterogeneous Chatbot Designs 测量和聚类异构聊天机器人设计

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-13 DOI: 10.1145/3637228

Pablo C. Cañizares, Jose María López-Morales, Sara Pérez-Soler, Esther Guerra, Juan de Lara

{"title":"Measuring and Clustering Heterogeneous Chatbot Designs","authors":"Pablo C. Cañizares, Jose María López-Morales, Sara Pérez-Soler, Esther Guerra, Juan de Lara","doi":"10.1145/3637228","DOIUrl":"https://doi.org/10.1145/3637228","url":null,"abstract":"Conversational agents, or chatbots, have become popular to access all kind of software services. They provide an intuitive natural language interface for interaction, available from a wide range of channels including social networks, web pages, intelligent speakers or cars. In response to this demand, many chatbot development platforms and tools have emerged. However, they typically lack support to statically measure properties of the chatbots being built, as indicators of their size, complexity, quality or usability. Similarly, there are hardly any mechanisms to compare and cluster chatbots developed with heterogeneous technologies. To overcome this limitation, we propose a suite of 21 metrics for chatbot designs, as well as two clustering methods that help in grouping chatbots along their conversation topics and design features. Both the metrics and the clustering methods are defined on a neutral chatbot design language, becoming independent of the implementation platform. We provide automatic translations of chatbots defined on some major platforms into this neutral notation to perform the measurement and clustering. The approach is supported by our tool Asymob, which we have used to evaluate the metrics and the clustering methods over a set of 259 Dialogflow and Rasa chatbots from open-source repositories. The results open the door to incorporating the metrics within chatbot development processes for the early detection of quality issues, and to exploit clustering to organise large collections of chatbots into significant groups to ease chatbot comprehension, search and comparison.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"44 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138628466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Smart Contract Code Repair Recommendation based on Reinforcement Learning and Multi-metric Optimization 基于强化学习和多指标优化的智能合约代码修复建议

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-11 DOI: 10.1145/3637229

Hanyang Guo, Yingye Chen, Xiangping Chen, Yuan Huang, Zibin Zheng

{"title":"Smart Contract Code Repair Recommendation based on Reinforcement Learning and Multi-metric Optimization","authors":"Hanyang Guo, Yingye Chen, Xiangping Chen, Yuan Huang, Zibin Zheng","doi":"10.1145/3637229","DOIUrl":"https://doi.org/10.1145/3637229","url":null,"abstract":"A smart contract is a kind of code deployed on the blockchain that executes automatically once an event triggers a clause in the contract. Since smart contracts involve businesses such as asset transfer, they are more vulnerable to attacks, so it is crucial to ensure the security of smart contracts. Because a smart contract cannot be tampered with once deployed on the blockchain, for smart contract developers, it is necessary to fix vulnerabilities before deployment. Compared with many vulnerability detection tools for smart contracts, the amount of automatic fix approaches for smart contracts is relatively limited. These approaches mainly use defined pattern-based methods or heuristic search algorithms for vulnerability repairs. In this paper, we propose RLRep, a reinforcement learning-based approach to provide smart contract repair recommendations for smart contract developers automatically. This approach adopts an agent to provide repair action suggestions based on the vulnerable smart contract without any supervision, which can solve the problem of missing labeled data in machine learning-based repair methods. We evaluate our approach on a dataset containing 853 smart contract programs (programming language: Solidity) with different kinds of vulnerabilities. We split them into training and test set. The result shows that our approach can provide 54.97% correct repair recommendations for smart contracts.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"15 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138574724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect Prediction 用 SZZ 工具估算及时缺陷预测中标签变化的不确定性

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-11 DOI: 10.1145/3637226

Shikai Guo, Dongmin Li, Lin Huang, Sijia Lv, Rong Chen, Hui Li, Xiaochen Li, He Jiang

{"title":"Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect Prediction","authors":"Shikai Guo, Dongmin Li, Lin Huang, Sijia Lv, Rong Chen, Hui Li, Xiaochen Li, He Jiang","doi":"10.1145/3637226","DOIUrl":"https://doi.org/10.1145/3637226","url":null,"abstract":"The aim of Just-In-Time (JIT) defect prediction is to predict software changes that are prone to defects in a project in a timely manner, thereby improving the efficiency of software development and ensuring software quality. Identifying changes that introduce bugs is a critical task in just-in-time defect prediction, and researchers have introduced the SZZ approach and its variants to label these changes. However, it has been shown that different SZZ algorithms introduce noise to the dataset to a certain extent, which may reduce the predictive performance of the model. To address this limitation, we propose the Confident Learning Imbalance (CLI) model. The model identifies and excludes samples whose labels may be corrupted by estimating the joint distribution of noisy labels and true labels, and mitigates the impact of noisy data on the performance of the prediction model. The CLI consists of two components: identifying noisy data (Confident Learning Component) and generating a predicted probability matrix for imbalanced data (Imbalanced Data Probabilistic Prediction Component). The IDPP component generates precise predicted probabilities for each instance in the training set, while the CL component uses the generated predicted probability matrix and noise labels to clean up the noise and build a classification model. We evaluate the performance of our model through extensive experiments on a total of 126,526 changes from ten Apache open source projects, and the results show that our model outperforms the baseline methods.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"25 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138577117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Smart Status Based Monitoring Algorithm for the Dynamic Analysis of Memory Safety 基于智能状态的内存安全动态分析监控算法

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-11 DOI: 10.1145/3637227

Zhe Chen, Rui Yan, Yingzi Ma, Yulei Sui, Jingling Xue

{"title":"A Smart Status Based Monitoring Algorithm for the Dynamic Analysis of Memory Safety","authors":"Zhe Chen, Rui Yan, Yingzi Ma, Yulei Sui, Jingling Xue","doi":"10.1145/3637227","DOIUrl":"https://doi.org/10.1145/3637227","url":null,"abstract":"C is a dominant programming language for implementing system and low-level embedded software. Unfortunately, the unsafe nature of its low-level control of memory often leads to memory errors. Dynamic analysis has been widely used to detect memory errors at runtime. However, existing monitoring algorithms for dynamic analysis are not yet satisfactory as they cannot deterministically and completely detect some types of errors, e.g., segment confusion errors, sub-object overflows, use-after-frees and memory leaks. We propose a new monitoring algorithm, namely Smatus, short for smart status, that improves memory safety by performing comprehensive dynamic analysis. The key innovation is to maintain at runtime a small status node for each memory object. A status node records the status value and reference count of an object, where the status value denotes the liveness and segment type of this object, and the reference count tracks the number of pointer variables pointing to this object. Smatus maintains at runtime a pointer metadata for each pointer variable, to record not only the base and bound of a pointer’s referent but also the address of the referent’s status node. All the pointers pointing to the same referent share the same status node in their pointer metadata. A status node is smart in the sense that it is automatically deleted when it becomes useless (indicated by its reference count reaching zero). To the best of our knowledge, Smatus represents the most comprehensive approach of its kind. We have evaluated Smatus by using a large set of programs including the NIST Software Assurance Reference Dataset, MSBench, MiBench, SPEC and stress testing benchmarks. In terms of effectiveness (detecting different types of memory errors), Smatus outperforms state-of-the-art tools, Google’s AddressSanitizer, SoftBoundCETS and Valgrind, as it is capable of detecting more errors. In terms of performance (the time and memory overheads), Smatus outperforms SoftBoundCETS and Valgrind in terms of both lower time and memory overheads incurred, and is on par with AddressSanitizer in terms of the time and memory overhead tradeoff made (with much lower memory overheads incurred).","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"13 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138569386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Algorithm Selection for Software Verification using Graph Neural Networks 利用图神经网络为软件验证选择算法

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-11 DOI: 10.1145/3637225

Will Leeson, Matthew B. Dwyer

{"title":"Algorithm Selection for Software Verification using Graph Neural Networks","authors":"Will Leeson, Matthew B. Dwyer","doi":"10.1145/3637225","DOIUrl":"https://doi.org/10.1145/3637225","url":null,"abstract":"The field of software verification has produced a wide array of algorithmic techniques that can prove a variety of properties of a given program. It has been demonstrated that the performance of these techniques can vary up to 4 orders of magnitude on the same verification problem. Even for verification experts, it is difficult to decide which tool will perform best on a given problem. For general users, deciding the best tool for their verification problem is effectively impossible. In this work, we present Graves, a selection strategy based on graph neural networks (GNNs). Graves generates a graph representation of a program from which a GNN predicts a score for a verifier that indicates its performance on the program. We evaluate Graves on a set of 10 verification tools and over 8000 verification problems and find that it improves the state-of-the-art in verification algorithm selection by 12%, or 8 percentage points. Further, it is able to verify 9% more problems than any existing verifier on our test set. Through a qualitative study on model interpretability, we find strong evidence that the Graves’ model learns to base its predictions on factors that relate to the unique features of the algorithmic techniques.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"52 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138569131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Representation Learning for Stack Overflow Posts: How Far are We? Stack Overflow 帖子的表征学习：我们还有多远？

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-07 DOI: 10.1145/3635711

Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, David Lo

{"title":"Representation Learning for Stack Overflow Posts: How Far are We?","authors":"Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, David Lo","doi":"10.1145/3635711","DOIUrl":"https://doi.org/10.1145/3635711","url":null,"abstract":"The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content. The performance of such solutions hinges significantly on the selection of representation models for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers’ interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon neural networks such as convolutional neural network (CNN) and transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been evaluated in the same experimental setting. To fill the research gap, we first empirically compare the performance of the representation models designed specifically for Stack Overflow posts (Post2Vec and BERTOverflow) in a wide range of related tasks, i.e., tag recommendation, relatedness prediction, and API recommendation. The results show that Post2Vec cannot further improve the state-of-the-art techniques of the considered downstream tasks, and BERTOverflow shows surprisingly poor performance. To find more suitable representation models for the posts, we further explore a diverse set of transformer-based models, including (1) general domain language models (RoBERTa, Longformer, GPT2) and (2) language models built with software engineering-related textual artifacts (CodeBERT, GraphCodeBERT, seBERT, CodeT5, PLBart, and CodeGen). This exploration shows that models like CodeBERT and RoBERTa are suitable for representing Stack Overflow posts. However, it also illustrates the “No Silver Bullet” concept, as none of the models consistently wins against all the others. Inspired by the findings, we propose SOBERT, which employs a simple yet effective strategy to improve the representation models of Stack Overflow posts by continuing the pre-training phase with the textual artifact from Stack Overflow. The overall experimental results demonstrate that SOBERT can consistently outperform the considered models and increase the state-of-the-art performance significantly for all the downstream tasks.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"101 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138547269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6