Automated Software Engineering最新文献_第8页

LLM-enhanced evolutionary test generation for untyped languages llm增强的无类型语言进化测试生成

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2025-02-17 DOI: 10.1007/s10515-025-00496-7

Ruofan Yang, Xianghua Xu, Ran Wang

{"title":"LLM-enhanced evolutionary test generation for untyped languages","authors":"Ruofan Yang, Xianghua Xu, Ran Wang","doi":"10.1007/s10515-025-00496-7","DOIUrl":"10.1007/s10515-025-00496-7","url":null,"abstract":"<div><p>Dynamic programming languages, such as Python, are widely used for their flexibility and support for rapid development. However, the absence of explicit parameter type declarations poses significant challenges in generating automated test cases. This often leads to random assignment of parameter types, increasing the search space and reducing testing efficiency. Current evolutionary algorithms, which rely heavily on random mutations, struggle to handle specific data types and frequently fall into local optima, making it difficult to generate high-quality test cases. Moreover, the resulting test suites often contain errors, preventing immediate usage in real-world applications. To address these challenges, this paper proposes the use of large language models to enhance test case generation for dynamic programming languages. Our method involves three key steps: analyzing parameter types to narrow the search space, introducing meaningful data during mutations to increase test case relevance, and using large language models to automatically repair errors in the generated test suites. Experimental results demonstrate a 16% improvement in test coverage, faster evolutionary cycles, and an increase in the number of executable test suites. These findings highlight the potential of large language models in improving both the efficiency and reliability of test case generation for dynamic programming languages.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Context-aware code summarization with multi-relational graph neural network 基于多关系图神经网络的上下文感知代码摘要

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2025-02-06 DOI: 10.1007/s10515-025-00490-z

Yanlin Wang, Ensheng Shi, Lun Du, Xiaodi Yang, Yuxuan Hu, Yanli Wang, Daya Guo, Shi Han, Hongyu Zhang, Dongmei Zhang

{"title":"Context-aware code summarization with multi-relational graph neural network","authors":"Yanlin Wang, Ensheng Shi, Lun Du, Xiaodi Yang, Yuxuan Hu, Yanli Wang, Daya Guo, Shi Han, Hongyu Zhang, Dongmei Zhang","doi":"10.1007/s10515-025-00490-z","DOIUrl":"10.1007/s10515-025-00490-z","url":null,"abstract":"<div><p>Source code summaries are short natural language descriptions of code snippets that help developers better understand and maintain source code. There has been a surge of work on automatic code summarization to reduce the burden of writing summaries manually. However, contemporary approaches only leverage the information within the boundary of the method being summarized (i.e., local context), and ignore the broader context that could assist with code summarization. This paper explores two global contexts, namely intra-class and inter-class contexts, and proposes CoCoSUM: Context-Aware Code Summarization with Multi-Relational Graph Neural Network. CoCoSUM first incorporates class names as the intra-class context to generate the class semantic embeddings. Then, relevant Unified Modeling Language (UML) class diagrams are extracted as inter-class context and are encoded into the class relational embeddings using a novel Multi-Relational Graph Neural Network (MRGNN). Class semantic embeddings and class relational embeddings, together with the outputs from code token encoder and AST encoder, are passed to a decoder armed with a two-level attention mechanism to generate high-quality, context-aware code summaries. Experimental results show that CoCoSUM outperforms state-of-the-art methods and the global contexts adopted in CoCoSUM can also strengthen existing code summarization models. Our replication package is anonymously available at https://github.com/DeepSoftwareAnalytics/cocosum.\u0000</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing multi-objective test case selection through the mutation operator 通过变异算子增强多目标测试用例选择

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2025-01-30 DOI: 10.1007/s10515-025-00489-6

Miriam Ugarte, Pablo Valle, Miren Illarramendi, Aitor Arrieta

{"title":"Enhancing multi-objective test case selection through the mutation operator","authors":"Miriam Ugarte, Pablo Valle, Miren Illarramendi, Aitor Arrieta","doi":"10.1007/s10515-025-00489-6","DOIUrl":"10.1007/s10515-025-00489-6","url":null,"abstract":"<div><p>Test case selection has been a widely investigated technique to increase the cost-effectiveness of software testing. Because the search space in this problem is huge, search-based approaches have been found effective, where an optimization algorithm (e.g., a genetic algorithm) applies mutation and crossover operators guided by corresponding objective functions with the goal of reducing the test execution cost while maintaining the overall test quality. The de-facto mutation operator is the bit-flip mutation, where a test case is mutated with a probability of 1/<i>N</i>, <i>N</i> being the total number of test cases in the original test suite. This has a core disadvantage: an effective test case and an ineffective one have the same probability of being selected or removed. In this paper, we advocate for a novel mutation operator that promotes selecting cost-effective test cases while removing the ineffective and expensive ones. To this end, instead of applying a probability of 1/<i>N</i> to every single test case in the original test suite, we calculate new selection and removal probabilities. This is carried out based on the adequacy criterion as well as the cost of each test case, determined before executing the algorithm (e.g., based on historical data). We evaluate our approach in 13 case study system, including 3 industrial case studies, in three different application domains (i.e., Cyber-Physical Systems (CPSs), continuous integration systems and industrial control systems). Our results suggests that the proposed approach can increase the cost-effectiveness of search-based test case selection methods, especially when the time budget for executing test cases is low.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143110071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation BadCodePrompt：针对用于代码生成的大型语言模型的提示工程的后门攻击

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2025-01-28 DOI: 10.1007/s10515-024-00485-2

Yubin Qu, Song Huang, Yanzhou Li, Tongtong Bai, Xiang Chen, Xingya Wang, Long Li, Yongming Yao

{"title":"BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation","authors":"Yubin Qu, Song Huang, Yanzhou Li, Tongtong Bai, Xiang Chen, Xingya Wang, Long Li, Yongming Yao","doi":"10.1007/s10515-024-00485-2","DOIUrl":"10.1007/s10515-024-00485-2","url":null,"abstract":"<div><p>Using few-shot demonstrations in prompts significantly enhances the generation quality of large language models (LLMs), including code generation. However, adversarial examples injected by malicious service providers via few-shot prompting pose a risk of backdoor attacks in large language models. There is no research on backdoor attacks on large language models in the few-shot prompting setting for code generation tasks. In this paper, we propose <span>BadCodePrompt</span>, the first backdoor attack for code generation tasks targeting LLMS in the few-shot prompting scenario, without requiring access to training data or model parameters and with lower computational overhead. <span>BadCodePrompt</span> exploits the insertion of triggers and poisonous code patterns into examples, causing the output of poisonous source code when there is a backdoor trigger in the end user’s query prompt. We demonstrate the effectiveness of <span>BadCodePrompt</span> in conducting backdoor attacks on three LLMS (GPT-4, Claude-3.5-Sonnet, and Gemini Pro-1.5) in code generation tasks without affecting the functionality of the generated code. LLMs with stronger reasoning capabilities are also more vulnerable to <span>BadCodePrompt</span>, with an average attack success rate of up to 98.53% for GPT-4 in two benchmark tasks. Finally, we employ state-of-the-art defenses against backdoor attacks in Prompt Engineering and show their overall ineffectiveness against <span>BadCodePrompt</span>. Therefore, <span>BadCodePrompt</span> remains a serious threat to LLMS, underscoring the urgency of developing effective defense mechanisms.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143109964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search RFMC-CS：一种基于表示融合的多视图动量对比学习框架

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2025-01-27 DOI: 10.1007/s10515-025-00487-8

Gong Chen, Wenjie Liu, Xiaoyuan Xie

{"title":"RFMC-CS: a representation fusion based multi-view momentum contrastive learning framework for code search","authors":"Gong Chen, Wenjie Liu, Xiaoyuan Xie","doi":"10.1007/s10515-025-00487-8","DOIUrl":"10.1007/s10515-025-00487-8","url":null,"abstract":"<div><p>Code search is a crucial task in software engineering, aiming to search relevant code from the codebase based on natural language queries. While deep-learning-based code search methods have demonstrated impressive performance, recent advances in contrastive learning have further enhanced the representation learning of these models. Despite these improvements, existing methods still have limitations in the representation learning of multi-modal data. Specifically, these methods suffer from a semantic loss in the representation learning of code and fail to explore functionally relevant code pairs in the representation learning fully. To address these limitations, we propose <i>A</i> <i><u>R</u></i><i>epresentation</i> <i><u>F</u></i><i>usion based</i> <i><u>M</u></i><i>ulti-View Momentum</i> <i><u>C</u></i><i>ontrastive Learning Framework for</i> <i><u>C</u></i><i>ode</i> <i><u>S</u></i><i>earch</i>, <i>named RFMC-CS</i>. <i>RFMC-CS</i> effectively retains the semantic and structural information of code through multi-modal representation and fusion. Through elaborately designed Multi-View Momentum Contrastive Learning, <i>RFMC-CS</i> can further learn the correlations between different modalities of samples and semantic relevant samples. The experimental results on the CodeSearchNet benchmark show that <i>RFMC-CS</i> outperforms seven advanced baselines on MRR and Recall@k metrics. The ablation experiments illustrate the effectiveness of each component. The portability experiments show that <i>RFMC-CS</i> has good portability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143109689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large language model based mutations in genetic improvement 基于基因改良突变的大型语言模型

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2025-01-21 DOI: 10.1007/s10515-024-00473-6

Alexander E. I. Brownlee, James Callan, Karine Even-Mendoza, Alina Geiger, Carol Hanna, Justyna Petke, Federica Sarro, Dominik Sobania

{"title":"Large language model based mutations in genetic improvement","authors":"Alexander E. I. Brownlee, James Callan, Karine Even-Mendoza, Alina Geiger, Carol Hanna, Justyna Petke, Federica Sarro, Dominik Sobania","doi":"10.1007/s10515-024-00473-6","DOIUrl":"10.1007/s10515-024-00473-6","url":null,"abstract":"<div><p>Ever since the first large language models (LLMs) have become available, both academics and practitioners have used them to aid software engineering tasks. However, little research as yet has been done in combining search-based software engineering (SBSE) and LLMs. In this paper, we evaluate the use of LLMs as mutation operators for genetic improvement (GI), an SBSE approach, to improve the GI search process. In a preliminary work, we explored the feasibility of combining the <i>Gin</i> Java GI toolkit with OpenAI LLMs in order to generate an edit for the <span>JCodec</span> tool. Here we extend this investigation involving three LLMs and three types of prompt, and five real-world software projects. We sample the edits at random, as well as using local search. We also conducted a qualitative analysis to understand why LLM-generated code edits break as part of our evaluation. Our results show that, compared with conventional statement GI edits, LLMs produce fewer unique edits, but these compile and pass tests more often, with the <span>OpenAI</span> model finding test-passing edits 77% of the time. The <span>OpenAI</span> and <span>Mistral</span> LLMs are roughly equal in finding the best run-time improvements. Simpler prompts are more successful than those providing more context and examples. The qualitative analysis reveals a wide variety of areas where LLMs typically fail to produce valid edits commonly including inconsistent formatting, generating non-Java syntax, or refusing to provide a solution.\u0000</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00473-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142995345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Vulnerability detection with graph enhancement and global dependency representation learning 基于图增强和全局依赖表示学习的漏洞检测

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2025-01-05 DOI: 10.1007/s10515-024-00484-3

Xuehai Jia, Junwei Du, Minying Fang, Hao Liu, Yuying Li, Feng Jiang

{"title":"Vulnerability detection with graph enhancement and global dependency representation learning","authors":"Xuehai Jia, Junwei Du, Minying Fang, Hao Liu, Yuying Li, Feng Jiang","doi":"10.1007/s10515-024-00484-3","DOIUrl":"10.1007/s10515-024-00484-3","url":null,"abstract":"<div><p>Vulnerability detection is essential for protecting software systems from attacks. Graph neural networks (GNNs) have proven effective in capturing semantic features of code and are widely used for this purpose. Existing GNN-based methods typically merge multiple graphs and employ GNNs to learn syntactic and semantic relationships within code graph structures. However, these methods face a significant limitation: current code graph structures inadequately represent parameter dependencies and node type information, which are crucial for capturing vulnerability patterns. This inadequacy hampers the GNNs’ ability to discern and characterize vulnerable code, thereby undermining effective vulnerability detection. Additionally, traditional GNN-based methods may lose long-distance dependency information during aggregation, which is vital for understanding the behavior and occurrence patterns of vulnerable code. Despite achieving state-of-the-art performance, existing GNN-based methods struggle to fully understand vulnerability behaviors and their potential impacts. To address these issues, this paper introduces VulDecgre, a novel vulnerability detection model comprising two components: (1) An enhanced code graph structure that fuses multiple graphs and relational edges to improve code representation. (2) A natural sequence-aware learning module that integrates code execution sequence information to enhance vulnerability detection. Extensive experiments on three public datasets and a self-collected large-scale real-world C/C++ dataset demonstrate that VulDecgre achieves superior performance in vulnerability detection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting question relatedness in programming Q&A communities via bimodal feature fusion 基于双峰特征融合的编程问答社区问题相关性检测

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2025-01-04 DOI: 10.1007/s10515-024-00482-5

Qirong Bu, Xiangqiang Guo, Xia Sun, Jingjing Jiang, Xiaodi Zhao, Wang Zou, Xuxin Wang, Jianqiang Yan

{"title":"Detecting question relatedness in programming Q&A communities via bimodal feature fusion","authors":"Qirong Bu, Xiangqiang Guo, Xia Sun, Jingjing Jiang, Xiaodi Zhao, Wang Zou, Xuxin Wang, Jianqiang Yan","doi":"10.1007/s10515-024-00482-5","DOIUrl":"10.1007/s10515-024-00482-5","url":null,"abstract":"<div><p>Programming community-based question and answering websites, represented by Stack Overflow, are popular among programmers. Users post questions and share their knowledge and experience through answering. Nonetheless, the accumulation of a large number of similar questions reduces the efficiency and quality of the community. To tackle this issue, related works utilize the complete textual information in the question posts for detecting question relatedness. But they almost all ignore the rich source code information in the posts, which also complements the semantics of the questions. In this paper, we propose a bimodal framework for relatedness detection based on the combination of text features and code features. Question pairs are encoded using a text pre-trained language model (e.g., SOBERT) and a code pre-trained language model (e.g., UniXcoder), respectively. With the powerful semantic modeling capabilities of pre-trained models, we obtain bimodal features that measure the similarity of questions from both text and code perspectives. However, directly concatenating and fusing these features may have a negative impact due to the significant differences between them. To address this, we additionally leverage the cross-attention mechanism to derive supplementary features of these bimodal features for the correct feature fusion. Cross-attention captures semantic understanding from both modalities, integrating their representations. These supplementary features measure the semantic relationship between text-guided and code-guided features, effectively bridging the semantic gap. We conducted extensive experiments on two related datasets from both the English and Chinese domains. The results show that our approach improves significantly over the baseline approaches, achieving advanced performance in the metrics of Macro-Precision, Macro-Recall and Macro-F1.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adversarial generation method for smart contract fuzz testing seeds guided by chain-based LLM 基于链的LLM引导下的智能合约模糊测试种子对抗生成方法

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2024-12-31 DOI: 10.1007/s10515-024-00483-4

Jiaze Sun, Zhiqiang Yin, Hengshan Zhang, Xiang Chen, Wei Zheng

{"title":"Adversarial generation method for smart contract fuzz testing seeds guided by chain-based LLM","authors":"Jiaze Sun, Zhiqiang Yin, Hengshan Zhang, Xiang Chen, Wei Zheng","doi":"10.1007/s10515-024-00483-4","DOIUrl":"10.1007/s10515-024-00483-4","url":null,"abstract":"<div><p>With the rapid development of smart contract technology and the continuous expansion of blockchain application scenarios, the security issues of smart contracts have garnered significant attention. However, traditional fuzz testing typically relies on randomly generated initial seed sets. This random generation method fails to understand the semantics of smart contracts, resulting in insufficient seed coverage. Additionally, traditional fuzz testing often ignores the syntax and semantic constraints within smart contracts, leading to the generation of seeds that may not conform to the syntactic rules of the contracts and may even include logic that violates contract semantics, thereby reducing the efficiency of fuzz testing. To address these challenges, we propose a method for adversarial generation for smart contract fuzz testing seeds guided by Chain-Based LLM, leveraging the deep semantic understanding capabilities of LLM to assist in seed set generation. Firstly, we propose a method that utilizes Chain-Based prompts to request LLM to generate fuzz testing seeds, breaking down the LLM tasks into multiple steps to gradually guide the LLM in generating high-coverage seed sets. Secondly, by establishing adversarial roles for the LLM, we guide the LLM to autonomously generate and optimize seed sets, producing high-coverage initial seed sets for the program under test. To evaluate the effectiveness of the proposed method, 2308 smart contracts were crawled from Etherscan for experimental purposes. Results indicate that using Chain-Based prompts to request LLM to generate fuzz testing seed sets improved instruction coverage by 2.94% compared to single-step requests. The method of generating seed sets by establishing adversarial roles for the LLM reduced the time to reach maximum instruction coverage from 60 s to approximately 30 s compared to single-role methods. Additionally, the seed sets generated by the proposed method can directly trigger simple types of vulnerabilities (e.g., timestamp dependency and block number dependency vulnerabilities), with instruction coverage improvements of 3.8% and 4.1%, respectively.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Meta network attention-based feature matching for heterogeneous defect prediction 基于元网络关注的异构缺陷预测特征匹配

IF 2 2区计算机科学

Automated Software Engineering Pub Date : 2024-12-20 DOI: 10.1007/s10515-024-00480-7

Meetesh Nevendra, Pradeep Singh

{"title":"Meta network attention-based feature matching for heterogeneous defect prediction","authors":"Meetesh Nevendra, Pradeep Singh","doi":"10.1007/s10515-024-00480-7","DOIUrl":"10.1007/s10515-024-00480-7","url":null,"abstract":"<div><p>Cross-project defect prediction (CPDP) involves predicting defects in projects without historical data by utilizing information from other projects. This requires uniform metrics across source and target projects (CPDP-CM). However, heterogeneous defect prediction (HDP), which deals with different metric sets, faces challenges such as feature alignment and distribution inequalities. This paper addresses these challenges with a novel method: Meta Network Attention-based Feature Matching (MNAFM) for HDP. Our approach uses a meta-network to identify feature similarities and adjust distillation intensity, enhancing HDP accuracy. Experiments on 30 projects demonstrate that MNAFM significantly outperforms baseline methods, showing improvements in f-measure (11.42–64.12%), g-measure (11.94–30.12%), and MCC (16.58–98.63%). Statistical tests confirm that MNAFM outperforms eight benchmark algorithms. Additionally, an ablation study highlights the contribution of each component of MNAFM, demonstrating the importance of the attention mechanism, data augmentation, symmetrical padding, and the use of a pretrained ResNet model in achieving superior performance. In summary, MNAFM offers a significant advancement in heterogeneous defect prediction by effectively leveraging feature similarities, distillation adjustments, and a robust methodological framework.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142859788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0