ACM Transactions on Software Engineering and Methodology最新文献_第3页

Automatic Repair of Quantum Programs via Unitary Operation 通过单元操作自动修复量子程序

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-05-11 DOI: 10.1145/3664604

Yuechen Li, Hanyu Pei, Linzhi Huang, Beibei Yin, Kai-Yuan Cai

{"title":"Automatic Repair of Quantum Programs via Unitary Operation","authors":"Yuechen Li, Hanyu Pei, Linzhi Huang, Beibei Yin, Kai-Yuan Cai","doi":"10.1145/3664604","DOIUrl":"https://doi.org/10.1145/3664604","url":null,"abstract":"With the continuous advancement of quantum computing (QC), the demand for high-quality quantum programs (QPs) is growing. In order to avoid program failure, in software engineering, the technology of automatic program repair (APR) employs appropriate patches to remove potential bugs without the intervention of a human. However, the method tailored for repairing defective QPs is still absent. This paper proposes a new APR method named (texttt {UnitAR} ) that can repair QPs via unitary operation automatically. Based on the characteristics of superposition and entanglement in QC, the paper constructs an algebraic model and adopts a generate-and-validate approach for the repair procedure. Furthermore, the paper presents two schemes that can respectively promote the efficiency of generating patches and guarantee the effectiveness of applying patches. For the purpose of evaluating the proposed method, the paper selects 29 mutated versions as well as 5 real-world buggy programs as the objects, and introduces two traditional APR approaches (texttt {GenProg} ) and (texttt {TBar} ) as baselines. According to the experiments, (texttt {UnitAR} ) can fix 23 buggy programs, and this method demonstrates the highest efficiency and effectiveness among 3 APR approaches. Besides, the experimental results further manifest the crucial roles of two constituents involved in the framework of (texttt {UnitAR} ).","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"26 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Supporting Emotional Intelligence, Productivity and Team Goals while Handling Software Requirements Changes 在处理软件需求变更时支持情商、生产力和团队目标

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-05-11 DOI: 10.1145/3664600

Kashumi Madampe, Rashina Hoda, John Grundy

{"title":"Supporting Emotional Intelligence, Productivity and Team Goals while Handling Software Requirements Changes","authors":"Kashumi Madampe, Rashina Hoda, John Grundy","doi":"10.1145/3664600","DOIUrl":"https://doi.org/10.1145/3664600","url":null,"abstract":"Background:\u0000Research shows that emotional intelligence (EI) should be used alongside cognitive intelligence during requirements change (RC) handling in Software Engineering (SE), especially in agile settings. Objective: We wanted to study the role of EI in-depth during RC handling. Method:\u0000We conducted a mixed-methods study (an interview study followed by a survey study) with 124 software practitioners. Findings:\u0000We found the causal condition, intervening condition and causes lead to key direct consequences of regulating own emotions, managing relationships, and extended consequences of sustaining productivity, setting and sustaining team goals. We found several strategies of supporting EI during RC handling. Further, we found strong correlations between six strategies and one being aware of own emotions, regulating own emotions, sustaining team productivity, and setting and sustaining team goals. Conclusion:\u0000Empathising with others and tracking commitments and decisions as a team are key strategies that have strong correlations between managing emotions, between sustaining team productivity, and between setting and sustaining team goals. To the best of our knowledge, the framework we present in this paper is the first theoretical framework on EI in SE research. We provide recommendations for software practitioners to consider during RC handling.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"11 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Domain Adaptation With Max-Margin Principle for Cross-Project Imbalanced Software Vulnerability Detection 利用最大边际原则进行跨项目不平衡软件漏洞检测的深度域自适应技术

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-05-09 DOI: 10.1145/3664602

Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, Dinh Phung

{"title":"Deep Domain Adaptation With Max-Margin Principle for Cross-Project Imbalanced Software Vulnerability Detection","authors":"Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, Dinh Phung","doi":"10.1145/3664602","DOIUrl":"https://doi.org/10.1145/3664602","url":null,"abstract":"Software vulnerabilities (SVs) have become a common, serious, and crucial concern due to the ubiquity of computer software. Many AI-based approaches have been proposed to solve the software vulnerability detection (SVD) problem to ensure the security and integrity of software applications (in both the development and testing phases). However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SVD, and ii) tackling the scarcity of labeled vulnerability datasets that conventionally need laborious labeling effort by experts. In this paper, we propose a novel approach to tackle these two crucial issues. We first exploit the automatic representation learning with deep domain adaptation for SVD. We then propose a novel cross-domain kernel classifier leveraging the max-margin principle to significantly improve the transfer learning process of SVs from imbalanced labeled into imbalanced unlabeled projects. Our approach is the first work that leverages solid body theories of the max-margin principle, kernel methods, and bridging the gap between source and target domains for imbalanced domain adaptation (DA) applied in cross-project SVD. The experimental results on real-world software datasets show the superiority of our proposed method over state-of-the-art baselines. In short, our method obtains a higher performance on F1-measure, one of the most important measures in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"69 3 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fairness Testing of Machine Translation Systems 机器翻译系统的公平性测试

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-05-09 DOI: 10.1145/3664608

Zeyu Sun, Zhenpeng Chen, Jie Zhang, Dan Hao

{"title":"Fairness Testing of Machine Translation Systems","authors":"Zeyu Sun, Zhenpeng Chen, Jie Zhang, Dan Hao","doi":"10.1145/3664608","DOIUrl":"https://doi.org/10.1145/3664608","url":null,"abstract":"Machine translation is integral to international communication and extensively employed in diverse human-related applications. Despite remarkable progress, fairness issues persist within current machine translation systems. In this paper, we propose FairMT, an automated fairness testing approach tailored for machine translation systems. FairMT operates on the assumption that translations of semantically similar sentences, containing protected attributes from distinct demographic groups, should maintain comparable meanings. It comprises three key steps: (1) test input generation, producing inputs covering various demographic groups; (2) test oracle generation, identifying potential unfair translations based on semantic similarity measurements; and (3) regression, discerning genuine fairness issues from those caused by low-quality translation. Leveraging FairMT, we conduct an empirical study on three leading machine translation systems—Google Translate, T5, and Transformer. Our investigation uncovers up to 832, 1,984, and 2,627 unfair translations across the three systems, respectively. Intriguingly, we observe that fair translations tend to exhibit superior translation performance, challenging the conventional wisdom of a fairness-performance trade-off prevalent in the fairness literature.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"7 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities 揭开代码预训练模型的神秘面纱：调查语法和语义能力

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-05-09 DOI: 10.1145/3664606

Wei Ma, Shangqing Liu, Mengjie Zhao, Xiaofei Xie, Wenhang Wang, Qiang Hu, Jie Zhang, Yang Liu

{"title":"Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities","authors":"Wei Ma, Shangqing Liu, Mengjie Zhao, Xiaofei Xie, Wenhang Wang, Qiang Hu, Jie Zhang, Yang Liu","doi":"10.1145/3664606","DOIUrl":"https://doi.org/10.1145/3664606","url":null,"abstract":"Code models have made significant advancements in code intelligence by encoding knowledge about programming languages. While previous studies have explored the capabilities of these models in learning code syntax, there has been limited investigation on their ability to understand code semantics. Additionally, existing analyses assume the number of edges between nodes at the abstract syntax tree (AST) is related to syntax distance, and also often require transforming the high-dimensional space of deep learning models to a low-dimensional one, which may introduce inaccuracies. To study how code models represent code syntax and semantics, we conduct a comprehensive analysis of 7 code models, including four representative code pre-trained models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) and three large language models (StarCoder, CodeLlama and CodeT5+). We design four probing tasks to assess the models’ capacities in learning both code syntax and semantics. These probing tasks reconstruct code syntax and semantics structures (AST, CDG, DDG and CFG) in the representation space. These structures are core concepts for code understanding. We also investigate the syntax token role in each token representation and the long dependency between the code tokens. Additionally, we analyze the distribution of attention weights related to code semantic structures. Through extensive analysis, our findings highlight the strengths and limitations of different code models in learning code syntax and semantics. The results demonstrate that these models excel in learning code syntax, successfully capturing the syntax relationships between tokens and the syntax roles of individual tokens. However, their performance in encoding code semantics varies. CodeT5 and CodeBERT demonstrate proficiency in capturing control and data dependencies, while UnixCoder shows weaker performance in this aspect. We do not observe LLMs generally performing much better than pre-trained models. The shallow layers of LLMs perform better than their deep layers. The investigation of attention weights reveals that different attention heads play distinct roles in encoding code semantics. Our research findings emphasize the need for further enhancements in code models to better learn code semantics. This study contributes to the understanding of code models’ abilities in syntax and semantics analysis. Our findings provide guidance for future improvements in code models, facilitating their effective application in various code-related tasks.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"29 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140940639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Replication in Requirements Engineering: the NLP for RE Case 需求工程中的复制：NLP for RE 案例

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-04-15 DOI: 10.1145/3658669

Sallam Abualhaija, Fatma Başak Aydemir, Fabiano Dalpiaz, Davide Dell’Anna, Alessio Ferrari, Xavier Franch, Davide Fucci

{"title":"Replication in Requirements Engineering: the NLP for RE Case","authors":"Sallam Abualhaija, Fatma Başak Aydemir, Fabiano Dalpiaz, Davide Dell’Anna, Alessio Ferrari, Xavier Franch, Davide Fucci","doi":"10.1145/3658669","DOIUrl":"https://doi.org/10.1145/3658669","url":null,"abstract":"[Context] Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the heterogeneity of the tasks involving NLP, the tasks’ inherent hairiness, and, in turn, the heterogeneous reporting structure. [Solution] To address these issues, we propose a new artifact, referred to as ID-Card, whose goal is to provide a structured summary of research papers emphasizing replication-relevant information. We construct the ID-Card through a structured, iterative process based on design science. [Results] In this paper: (i) we report on hands-on experiences of replication, (ii) we review the state-of-the-art and extract replication-relevant information, (iii) we identify, through focus groups, challenges across two typical dimensions of replication: data annotation and tool reconstruction, and (iv) we present the concept and structure of the ID-Card to mitigate the identified challenges. [Contribution] This study aims to create awareness of replication in NLP for RE. We propose an ID-Card that is intended to foster study replication, but can also be used in other contexts, e.g., for educational purposes.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"45 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BatFix: Repairing language model-based transpilation BatFix：修复基于语言模型的转译

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-04-12 DOI: 10.1145/3658668

Daniel Ramos, Inês Lynce, Vasco Manquinho, Ruben Martins, Claire Le Goues

{"title":"BatFix: Repairing language model-based transpilation","authors":"Daniel Ramos, Inês Lynce, Vasco Manquinho, Ruben Martins, Claire Le Goues","doi":"10.1145/3658668","DOIUrl":"https://doi.org/10.1145/3658668","url":null,"abstract":"To keep up with changes in requirements, frameworks, and coding practices, software organizations might need to migrate code from one language to another. Source-to-source migration, or transpilation, is often a complex, manual process. Transpilation requires expertise both in the source and target language, making it highly laborious and costly. Languages models for code generation and transpilation are becoming increasingly popular. However, despite capturing code-structure well, code generated by language models is often spurious and contains subtle problems. We propose BatFix, a novel approach that augments language models for transpilation by leveraging program repair and synthesis to fix the code generated by these models. BatFix takes as input both the original program, the target program generated by the machine translation model, and a set of test cases and outputs a repaired program that passes all test cases. Experimental results show that our approach is agnostic to language models and programming languages. BatFix can locate bugs spawning multiple lines and synthesize patches for syntax and semantic bugs for programs migrated from <monospace>Java</monospace> to <monospace>C++</monospace> and <monospace>Python</monospace> to <monospace>C++</monospace> from multiple language models, including, OpenAI’s Codex.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"46 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases MR-Scout：从现有测试用例自动合成变形关系

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-04-09 DOI: 10.1145/3656340

Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, Shing-Chi Cheung

{"title":"MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases","authors":"Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, Shing-Chi Cheung","doi":"10.1145/3656340","DOIUrl":"https://doi.org/10.1145/3656340","url":null,"abstract":"Metamorphic Testing (MT) alleviates the oracle problem by defining oracles based on metamorphic relations (MRs), that govern multiple related inputs and their outputs. However, designing MRs is challenging, as it requires domain-specific knowledge. This hinders the widespread adoption of MT. We observe that developer-written test cases can embed domain knowledge that encodes MRs. Such encoded MRs could be synthesized for testing not only their original programs but also other programs that share similar functionalities. In this paper, we propose MR-Scout to automatically synthesize MRs from test cases in open-source software (OSS) projects. MR-Scout first discovers MR-encoded test cases (MTCs), and then synthesizes the encoded MRs into parameterized methods (called codified MRs), and filters out MRs that demonstrate poor quality for new test case generation. MR-Scout discovered over 11,000 MTCs from 701 OSS projects. Experimental results show that over 97% of codified MRs are of high quality for automated test case generation, demonstrating the practical applicability of MR-Scout. Furthermore, codified-MRs-based tests effectively enhance the test adequacy of programs with developer-written tests, leading to 13.52% and 9.42% increases in line coverage and mutation score, respectively. Our qualitative study shows that 55.76% to 76.92% of codified MRs are easily comprehensible for developers.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"46 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey of Source Code Search: A 3-Dimensional Perspective 源代码搜索调查：三维视角

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-04-06 DOI: 10.1145/3656341

Weisong Sun, Chunrong Fang, Yifei Ge, Yuling Hu, Yuchen Chen, Quanjun Zhang, Xiuting Ge, Yang Liu, Zhenyu Chen

{"title":"A Survey of Source Code Search: A 3-Dimensional Perspective","authors":"Weisong Sun, Chunrong Fang, Yifei Ge, Yuling Hu, Yuchen Chen, Quanjun Zhang, Xiuting Ge, Yang Liu, Zhenyu Chen","doi":"10.1145/3656341","DOIUrl":"https://doi.org/10.1145/3656341","url":null,"abstract":"(Source) code search is widely concerned by software engineering researchers because it can improve the productivity and quality of software development. Given a functionality requirement usually described in a natural language sentence, a code search system can retrieve code snippets that satisfy the requirement from a large-scale code corpus, e.g., GitHub. To realize effective and efficient code search, many techniques have been proposed successively. These techniques improve code search performance mainly by optimizing three core components, including query understanding component, code understanding component, and query-code matching component. In this paper, we provide a 3-dimensional perspective survey for code search. Specifically, we categorize existing code search studies into query-end optimization techniques, code-end optimization techniques, and match-end optimization techniques according to the specific components they optimize. These optimization techniques are proposed to enhance the performance of specific components, and thus the overall performance of code search. Considering that each end can be optimized independently and contributes to the code search performance, we treat each end as a dimension. Therefore, this survey is 3-dimensional in nature, and it provides a comprehensive summary of each dimension in detail. To understand the research trends of the three dimensions in existing code search studies, we systematically review 68 relevant literatures. Different from existing code search surveys that only focus on the query end or code end or introduce various aspects shallowly (including codebase, evaluation metrics, modeling technique, etc.), our survey provides a more nuanced analysis and review of the evolution and development of the underlying techniques used in the three ends. Based on a systematic review and summary of existing work, we outline several open challenges and opportunities at the three ends that remain to be addressed in future work.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"5 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Help Them Understand: Testing and Improving Voice User Interfaces 帮助他们理解：测试和改进语音用户界面

IF 4.4 2区计算机科学

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-04-05 DOI: 10.1145/3654438

Emanuela Guglielmi, Giovanni Rosa, Simone Scalabrino, Gabriele Bavota, Rocco Oliveto

{"title":"Help Them Understand: Testing and Improving Voice User Interfaces","authors":"Emanuela Guglielmi, Giovanni Rosa, Simone Scalabrino, Gabriele Bavota, Rocco Oliveto","doi":"10.1145/3654438","DOIUrl":"https://doi.org/10.1145/3654438","url":null,"abstract":"Voice-based virtual assistants are becoming increasingly popular. Such systems provide frameworks to developers for building custom apps. End-users can interact with such apps through a Voice User Interface (VUI), which allows the user to use natural language commands to perform actions. Testing such apps is not trivial: The same command can be expressed in different semantically equivalent ways. In this paper, we introduce VUI-UPSET, an approach that adapts chatbot-testing approaches to VUI-testing. We conducted an empirical study to understand how VUI-UPSET compares to two state-of-the-art approaches (i.e., a chatbot testing technique and ChatGPT) in terms of (i) correctness of the generated paraphrases, and (ii) capability of revealing bugs. To this aim, we analyzed 14,898 generated paraphrases for 40 Alexa Skills. Our results show that VUI-UPSET generates more bug-revealing paraphrases than the two baselines with, however, ChatGPT being the approach generating the highest percentage of correct paraphrases. We also tried to use the generated paraphrases to improve the skills. We tried to include in the voice interaction models of the skills (i) only the bug-revealing paraphrases, (ii) all the valid paraphrases. We observed that including only bug-revealing paraphrases is sometimes not sufficient to make all the tests pass.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"48 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0