Automated Software Engineering最新文献

筛选
英文 中文
Exploring the potential of general purpose LLMs in automated software refactoring: an empirical study
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-03-01 DOI: 10.1007/s10515-025-00500-0
Bo Liu, Yanjie Jiang, Yuxia Zhang, Nan Niu, Guangjie Li, Hui Liu
{"title":"Exploring the potential of general purpose LLMs in automated software refactoring: an empirical study","authors":"Bo Liu,&nbsp;Yanjie Jiang,&nbsp;Yuxia Zhang,&nbsp;Nan Niu,&nbsp;Guangjie Li,&nbsp;Hui Liu","doi":"10.1007/s10515-025-00500-0","DOIUrl":"10.1007/s10515-025-00500-0","url":null,"abstract":"<div><p>Software refactoring is an essential activity for improving the readability, maintainability, and reusability of software projects. To this end, a large number of automated or semi-automated approaches/tools have been proposed to locate poorly designed code, recommend refactoring solutions, and conduct specified refactorings. However, even equipped with such tools, it remains challenging for developers to decide where and what kind of refactorings should be applied. Recent advances in deep learning techniques, especially in large language models (LLMs), make it potentially feasible to automatically refactor source code with LLMs. However, it remains unclear how well LLMs perform compared to human experts in conducting refactorings automatically and accurately. To fill this gap, in this paper, we conduct an empirical study to investigate the potential of LLMs in automated software refactoring, focusing on the identification of refactoring opportunities and the recommendation of refactoring solutions. We first construct a high-quality refactoring dataset comprising 180 real-world refactorings from 20 projects, and conduct the empirical study on the dataset. With the to-be-refactored Java documents as input, ChatGPT and Gemini identified only 28 and 7 respectively out of the 180 refactoring opportunities. The evaluation results suggested that the performance of LLMs in identifying refactoring opportunities is generally low and remains an open problem. However, explaining the expected refactoring subcategories and narrowing the search space in the prompts substantially increased the success rate of ChatGPT from 15.6 to 86.7%. Concerning the recommendation of refactoring solutions, ChatGPT recommended 176 refactoring solutions for the 180 refactorings, and 63.6% of the recommended solutions were comparable to (even better than) those constructed by human experts. However, 13 out of the 176 solutions suggested by ChatGPT and 9 out of the 137 solutions suggested by Gemini were unsafe in that they either changed the functionality of the source code or introduced syntax errors, which indicate the risk of LLM-based refactoring.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CodeDoctor: multi-category code review comment generation
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-02-27 DOI: 10.1007/s10515-025-00491-y
Yingling Li, Yuhan Wu, Zi’ao Wang, Lei Huang, Junjie Wang, Jianping Li, Minying Huang
{"title":"CodeDoctor: multi-category code review comment generation","authors":"Yingling Li,&nbsp;Yuhan Wu,&nbsp;Zi’ao Wang,&nbsp;Lei Huang,&nbsp;Junjie Wang,&nbsp;Jianping Li,&nbsp;Minying Huang","doi":"10.1007/s10515-025-00491-y","DOIUrl":"10.1007/s10515-025-00491-y","url":null,"abstract":"<div><p>Code review is an effective software quality assurance activity. However, this process is labor-intensive and time-consuming, requiring reviewers to carefully review under various categories (e.g., function, refactoring, documentation, etc) to generate review comments. Several approaches have been proposed for automatic review comment generation, although they can generate review comments, they hardly cover all manual review comments. Because most of these approaches simply utilize the information of submitted code and review comments, not fully modeling the features of code review (i.e., ignoring review category, the association of issue snippets and review comments). In this paper, we propose CodeDoctor, an automatic review comment generator with data augmentation and category-aware encoder-decoder to generate multi-category review comments. It consists of three main phases: (1) Data augmentation phase, which classifies review comments and builds review exemplars (i.e., the pairs of issue snippet and its comment) to augment review data by using a large language model (LLM) with prompt engineering and feedback loops; (2) Encoder phase, which encodes the inputs (i.e., review category, diff code and review exemplar) into semantic and token representations; (3) Decoder phase, which designs a category-focused decoder to capture the most relevant information of given category for multi-category review comment generation. Evaluations with five commonly-used and state-of-the-art baselines on two datasets show that CodeDoctor outperforms all baselines, with 1770% higher average BLEU-4, 111% higher average ROUGE-L and 49% higher average F1 than the best baseline. Furthermore, a human evaluation also confirms the significant potential of applying CodeDoctor in practical usage. Our approach can relieve the burden of reviewers by automatically generating multi-category review comments, and helps developers better detect code issues as early as possible, thereby facilitating software development.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bmco-o: a smart code smell detection method based on co-occurrences
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-02-21 DOI: 10.1007/s10515-025-00486-9
Feiqiao Mao, Kaihang Zhong, Long Cheng
{"title":"Bmco-o: a smart code smell detection method based on co-occurrences","authors":"Feiqiao Mao,&nbsp;Kaihang Zhong,&nbsp;Long Cheng","doi":"10.1007/s10515-025-00486-9","DOIUrl":"10.1007/s10515-025-00486-9","url":null,"abstract":"<div><p>Code smell detection is a task aimed at identifying sub-optimal programming structures within code entities that may indicate problems requiring attention. It plays a crucial role in improving software quality. Numerous automatic or semi-automatic methods for code smell detection have been proposed. However, these methods are constrained by the manual setting of detection rules and thresholds, leading to subjective determinations, or they require large-scale labeled datasets for model training. In addition, they exhibit poor detection performance across different projects. Related studies have revealed the existence of co-occurrences among different types of code smells. Therefore, we propose a smart code smell detection method based on code smell co-occurrences, termed BMCo-O. The key insight is that code smell co-occurrences can assist in improving code smell detection. We introduce and utilize <i>code smell co-occurrence impact factor set</i>, a <i> code smell pre-filter mechanism</i>, and a <i>possibility mechanism</i>, which enable BMCo-O to demonstrate outstanding detection performance. To reduce manual intervention, we propose an <i>adaptive detection mechanism</i> that automatically adjusts parameters to detect different types of code smell in various software projects. As an initial attempt, we applied the proposed method to seven classical high-criticality code smells: Message Chain, Feature Envy, Spaghetti Code, Large Class, Complex Class, Refused Bequest, and Long Method. The evaluation results on benchmarks composed of open source software projects demonstrated that BMCo-O significantly outperforms the well-known and widely used methods in detecting these seven classical code smells, especially in F1, with improvements of 137%, 155%, 23%, 195%, 364%, 552% and 35%, respectively. To further verify its effectiveness in actual detection across different software projects, we also implemented a prototype of a new code smell detector using BMCo-O.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling functional aspects in google play education app titles and descriptions influencing app success
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-02-21 DOI: 10.1007/s10515-025-00497-6
Ahmad Bilal, Hamid Turab Mirza, Adnan Ahmad, Ibrar Hussain, Ahmad Salman Khan
{"title":"Unveiling functional aspects in google play education app titles and descriptions influencing app success","authors":"Ahmad Bilal,&nbsp;Hamid Turab Mirza,&nbsp;Adnan Ahmad,&nbsp;Ibrar Hussain,&nbsp;Ahmad Salman Khan","doi":"10.1007/s10515-025-00497-6","DOIUrl":"10.1007/s10515-025-00497-6","url":null,"abstract":"<div><p>Users search for applications on the online application store by inputting functional terms, such as “automated assignment solver”, “English translator” and “free VPN”. In response, the application store recommends a list of applications whose titles and descriptions closely match the user’s search terms. Acknowledging this, application developers incorporate trending and frequently searched functional terms into their application titles and descriptions to make them compelling and to enhance the visibility of their products in user searches, thereby increasing the likelihood of application success. However, traditional literature analyzing mobile application titles and descriptions to determine their impact on application success is scarce and may also lack data-analytical approaches. Moreover, the definition of application success provided by existing literature may be flawed, as it solely relies on higher downloads or positive numeric ratings, neglecting the crucial factor of time. This research proposes a Machine Learning-inspired framework to extract functional (aspects) themes from titles and descriptions of Google Play Education applications, influencing their success. It also formulates an enhanced definition of application success that considers downloads and ratings over a specific time period, and also integrates the user sentiment when defining application success. According to the findings of this research, themes of Math and Homework Support, Learning and Practice, Live Assistance and Tutoring, and Instant Solutions and Tools are highly correlated with success within the Education category of the Google Play store. Developers can enhance the visibility and appeal of their applications in user search results by incorporating these themes into their application titles and descriptions, ultimately leading to higher likelihood of success.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EmoReflex: an AI-powered emotion-centric developer insights platform
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-02-20 DOI: 10.1007/s10515-025-00488-7
Kashumi Madampe, John Grundy, Minh Nguyen, Ellen Welstead-Cloud, Vinh Tuan Huynh, Linh Doan, William Lay, Sayed Hashim
{"title":"EmoReflex: an AI-powered emotion-centric developer insights platform","authors":"Kashumi Madampe,&nbsp;John Grundy,&nbsp;Minh Nguyen,&nbsp;Ellen Welstead-Cloud,&nbsp;Vinh Tuan Huynh,&nbsp;Linh Doan,&nbsp;William Lay,&nbsp;Sayed Hashim","doi":"10.1007/s10515-025-00488-7","DOIUrl":"10.1007/s10515-025-00488-7","url":null,"abstract":"<div><p>There has been great interest in better understanding software engineer emotions during development. But how to do this? We built a prototype AI-powered Emotion-centric Developer Insights Platform, EmoReflex, to support developers to report and reflect how they feel when working on various tasks across different metrics. It also assists their managers to get insights into their team’s emotional health, and provides them with recommendations to guide them handle the team’s emotional wellbeing. We present our tool prototype and evaluation results generated by a user study conducted with two user groups consisting of twenty developers and twenty managers. We present some design implications derived from our user study that can be used to inform design decisions in emotion-centric software development tools.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00488-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MP: motion program synthesis with machine learning interpretability and knowledge graph analogy
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-02-18 DOI: 10.1007/s10515-025-00495-8
Cheng-Hao Cai
{"title":"MP: motion program synthesis with machine learning interpretability and knowledge graph analogy","authors":"Cheng-Hao Cai","doi":"10.1007/s10515-025-00495-8","DOIUrl":"10.1007/s10515-025-00495-8","url":null,"abstract":"<div><p>The advancement of physics-based engines has led to the popularity of virtual reality. To achieve a more realistic and immersive user experience, the behaviours of objects in virtual scenes are expected to conform to real-world physical laws accurately. This increases the workload and development time for developers. To facilitate development on physics-based engines, this paper proposes MP that is a motion program synthesis approach based on machine learning and analogical reasoning. MP follows the paradigm of test-driven development, where programs are generated to fit test cases of motions subject to multiple environmental factors such as gravity and airflows. To reduce the search space of code generation, regression models are used to find variables that cause significant influences to motions, while analogical reasoning on knowledge graphs is used to find operators that work for the found variables. Besides, constraint solving is used to probabilistically estimate the values of constants in motion programs. Experimental results have demonstrated that MP is efficient in various motion program generation tasks, with random forest regressors achieving low data and time requirements.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00495-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143438677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-enhanced evolutionary test generation for untyped languages
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-02-17 DOI: 10.1007/s10515-025-00496-7
Ruofan Yang, Xianghua Xu, Ran Wang
{"title":"LLM-enhanced evolutionary test generation for untyped languages","authors":"Ruofan Yang,&nbsp;Xianghua Xu,&nbsp;Ran Wang","doi":"10.1007/s10515-025-00496-7","DOIUrl":"10.1007/s10515-025-00496-7","url":null,"abstract":"<div><p>Dynamic programming languages, such as Python, are widely used for their flexibility and support for rapid development. However, the absence of explicit parameter type declarations poses significant challenges in generating automated test cases. This often leads to random assignment of parameter types, increasing the search space and reducing testing efficiency. Current evolutionary algorithms, which rely heavily on random mutations, struggle to handle specific data types and frequently fall into local optima, making it difficult to generate high-quality test cases. Moreover, the resulting test suites often contain errors, preventing immediate usage in real-world applications. To address these challenges, this paper proposes the use of large language models to enhance test case generation for dynamic programming languages. Our method involves three key steps: analyzing parameter types to narrow the search space, introducing meaningful data during mutations to increase test case relevance, and using large language models to automatically repair errors in the generated test suites. Experimental results demonstrate a 16% improvement in test coverage, faster evolutionary cycles, and an increase in the number of executable test suites. These findings highlight the potential of large language models in improving both the efficiency and reliability of test case generation for dynamic programming languages.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-aware code summarization with multi-relational graph neural network
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-02-06 DOI: 10.1007/s10515-025-00490-z
Yanlin Wang, Ensheng Shi, Lun Du, Xiaodi Yang, Yuxuan Hu, Yanli Wang, Daya Guo, Shi Han, Hongyu Zhang, Dongmei Zhang
{"title":"Context-aware code summarization with multi-relational graph neural network","authors":"Yanlin Wang,&nbsp;Ensheng Shi,&nbsp;Lun Du,&nbsp;Xiaodi Yang,&nbsp;Yuxuan Hu,&nbsp;Yanli Wang,&nbsp;Daya Guo,&nbsp;Shi Han,&nbsp;Hongyu Zhang,&nbsp;Dongmei Zhang","doi":"10.1007/s10515-025-00490-z","DOIUrl":"10.1007/s10515-025-00490-z","url":null,"abstract":"<div><p>Source code summaries are short natural language descriptions of code snippets that help developers better understand and maintain source code. There has been a surge of work on automatic code summarization to reduce the burden of writing summaries manually. However, contemporary approaches only leverage the information within the boundary of the method being summarized (i.e., local context), and ignore the broader context that could assist with code summarization. This paper explores two global contexts, namely intra-class and inter-class contexts, and proposes CoCoSUM: Context-Aware Code Summarization with Multi-Relational Graph Neural Network. CoCoSUM first incorporates class names as the intra-class context to generate the class semantic embeddings. Then, relevant Unified Modeling Language (UML) class diagrams are extracted as inter-class context and are encoded into the class relational embeddings using a novel Multi-Relational Graph Neural Network (MRGNN). Class semantic embeddings and class relational embeddings, together with the outputs from code token encoder and AST encoder, are passed to a decoder armed with a two-level attention mechanism to generate high-quality, context-aware code summaries. Experimental results show that CoCoSUM outperforms state-of-the-art methods and the global contexts adopted in CoCoSUM can also strengthen existing code summarization models. Our replication package is anonymously available at https://github.com/DeepSoftwareAnalytics/cocosum.\u0000</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing multi-objective test case selection through the mutation operator
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-01-30 DOI: 10.1007/s10515-025-00489-6
Miriam Ugarte, Pablo Valle, Miren Illarramendi, Aitor Arrieta
{"title":"Enhancing multi-objective test case selection through the mutation operator","authors":"Miriam Ugarte,&nbsp;Pablo Valle,&nbsp;Miren Illarramendi,&nbsp;Aitor Arrieta","doi":"10.1007/s10515-025-00489-6","DOIUrl":"10.1007/s10515-025-00489-6","url":null,"abstract":"<div><p>Test case selection has been a widely investigated technique to increase the cost-effectiveness of software testing. Because the search space in this problem is huge, search-based approaches have been found effective, where an optimization algorithm (e.g., a genetic algorithm) applies mutation and crossover operators guided by corresponding objective functions with the goal of reducing the test execution cost while maintaining the overall test quality. The de-facto mutation operator is the bit-flip mutation, where a test case is mutated with a probability of 1/<i>N</i>, <i>N</i> being the total number of test cases in the original test suite. This has a core disadvantage: an effective test case and an ineffective one have the same probability of being selected or removed. In this paper, we advocate for a novel mutation operator that promotes selecting cost-effective test cases while removing the ineffective and expensive ones. To this end, instead of applying a probability of 1/<i>N</i> to every single test case in the original test suite, we calculate new selection and removal probabilities. This is carried out based on the adequacy criterion as well as the cost of each test case, determined before executing the algorithm (e.g., based on historical data). We evaluate our approach in 13 case study system, including 3 industrial case studies, in three different application domains (i.e., Cyber-Physical Systems (CPSs), continuous integration systems and industrial control systems). Our results suggests that the proposed approach can increase the cost-effectiveness of search-based test case selection methods, especially when the time budget for executing test cases is low.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143110071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-01-28 DOI: 10.1007/s10515-024-00485-2
Yubin Qu, Song Huang, Yanzhou Li, Tongtong Bai, Xiang Chen, Xingya Wang, Long Li, Yongming Yao
{"title":"BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation","authors":"Yubin Qu,&nbsp;Song Huang,&nbsp;Yanzhou Li,&nbsp;Tongtong Bai,&nbsp;Xiang Chen,&nbsp;Xingya Wang,&nbsp;Long Li,&nbsp;Yongming Yao","doi":"10.1007/s10515-024-00485-2","DOIUrl":"10.1007/s10515-024-00485-2","url":null,"abstract":"<div><p>Using few-shot demonstrations in prompts significantly enhances the generation quality of large language models (LLMs), including code generation. However, adversarial examples injected by malicious service providers via few-shot prompting pose a risk of backdoor attacks in large language models. There is no research on backdoor attacks on large language models in the few-shot prompting setting for code generation tasks. In this paper, we propose <span>BadCodePrompt</span>, the first backdoor attack for code generation tasks targeting LLMS in the few-shot prompting scenario, without requiring access to training data or model parameters and with lower computational overhead. <span>BadCodePrompt</span> exploits the insertion of triggers and poisonous code patterns into examples, causing the output of poisonous source code when there is a backdoor trigger in the end user’s query prompt. We demonstrate the effectiveness of <span>BadCodePrompt</span> in conducting backdoor attacks on three LLMS (GPT-4, Claude-3.5-Sonnet, and Gemini Pro-1.5) in code generation tasks without affecting the functionality of the generated code. LLMs with stronger reasoning capabilities are also more vulnerable to <span>BadCodePrompt</span>, with an average attack success rate of up to 98.53% for GPT-4 in two benchmark tasks. Finally, we employ state-of-the-art defenses against backdoor attacks in Prompt Engineering and show their overall ineffectiveness against <span>BadCodePrompt</span>. Therefore, <span>BadCodePrompt</span> remains a serious threat to LLMS, underscoring the urgency of developing effective defense mechanisms.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143109964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信