Automated Software Engineering最新文献

筛选
英文 中文
ExtRep: a GUI test repair method for mobile applications based on test-extension
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-25 DOI: 10.1007/s10515-025-00513-9
Yonghao Long, Yuanyuan Chen, Chu Zeng, Xiangping Chen, Xing Chen, Xiaocong Zhou, Jingru Yang, Gang Huang, Zibin Zheng
{"title":"ExtRep: a GUI test repair method for mobile applications based on test-extension","authors":"Yonghao Long,&nbsp;Yuanyuan Chen,&nbsp;Chu Zeng,&nbsp;Xiangping Chen,&nbsp;Xing Chen,&nbsp;Xiaocong Zhou,&nbsp;Jingru Yang,&nbsp;Gang Huang,&nbsp;Zibin Zheng","doi":"10.1007/s10515-025-00513-9","DOIUrl":"10.1007/s10515-025-00513-9","url":null,"abstract":"<div><p>GUI testing ensures the software quality and user experience in the ever-changing mobile application development. Using test scripts is one of the main GUI testing manner, but it might be obsolete when the GUI changes with the app’s evolution. Current studies often rely on textual or visual similarity to perform test repair, but may be less effective when the interacted event sequence changes dramatically. In the interaction design, practitioners often provide multiple entry points to access the same function to gain higher openness and flexibility, which indicates that there may be multiple routes for reference in test repair. To evaluate the feasibility, we first conducted an exploratory study on 37 tests from 18 apps. The result showed that over 81% tests could be represented with alternative event paths, and using the extended paths could help enhance the test replay rate. Based on this finding, we propose a test-<b>ext</b>ension-based test <b>rep</b>air algorithm named <i>ExtRep</i>. The method first uses test-extension to find alternative paths with similar test objectives based on feature coverage, and then finds repaired result with the help of sequence transduction probability proposed in NLP area. Experiments conducted on 40 popular applications demonstrate that <i>ExtRep</i> can achieve a success rate of 73.68% in repairing 97 tests, which significantly outperforms current approaches <span>Water</span>, <span>Meter</span>, and <span>Guider</span>. Moreover, the test-extension approach displays immense potential for optimizing test repairs. A tool that implements the <i>ExtRep</i> is available for practical use and future research.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-aware prompting for LLM-based program repair
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-18 DOI: 10.1007/s10515-025-00512-w
Yingling Li, Muxin Cai, Junjie Chen, Yang Xu, Lei Huang, Jianping Li
{"title":"Context-aware prompting for LLM-based program repair","authors":"Yingling Li,&nbsp;Muxin Cai,&nbsp;Junjie Chen,&nbsp;Yang Xu,&nbsp;Lei Huang,&nbsp;Jianping Li","doi":"10.1007/s10515-025-00512-w","DOIUrl":"10.1007/s10515-025-00512-w","url":null,"abstract":"<div><p>Automated program repair (APR) plays a crucial role in ensuring the quality of software code, as manual bug-fixing is extremely time-consuming and labor-intensive. Traditional APR tools (e.g., template-based approaches) face the challenge of generalizing to different bug patterns, while deep learning (DL)-based methods heavily rely on training datasets and struggle to fix unseen bugs. Recently, large language models (LLMs) have shown great potential in APR due to their ability to generate patches, having achieved promising results. However, their effectiveness is still constrained by the casually-determined context (e.g., being unable to adaptively select the specific context according to the situation of each defect). Therefore, a more effective APR approach is highly needed, which provides more precise and comprehensive context for the given defect to enhance the robustness of LLM-based APRs. In this paper, we propose a context-aware APR approach named <b>CodeCorrector</b>, which designs a Chain-of-Thought (CoT) approach to follow developers’ program repair behaviors. Given a failing test and its buggy file, CodeCorrector first analyzes why the test fails based on the failure message to infer repair direction; then selects the relevant context information to this repair direction; finally builds the context-aware repair prompt to guide LLMs for patch generation. Our motivation is to offer a novel perspective for enhancing LLM-based program repair through context-aware prompting, which adaptively selects specific context for a given defect. The evaluation on the widely-used Defects4J (i.e., v1.2 and v2.0) benchmark shows that overall, by executing a small number of repairs (i.e., as few as ten rounds), CodeCorrector outperforms all the state-of-the-art baselines on the more complex defects in Defects4J v2.0 and the defects without fine-grained defect localization information in Defects4J v1.2. Specifically, a total of 38 defects are fixed by only CodeCorrector. We further analyze the contributions of two core components (i.e., repair directions, global context selection) to the performance of CodeCorrector, especially repair directions, which improve CodeCorrector by 112% in correct patches and 78% in plausible patches on Defects4J v1.2. Moreover, CodeCorrector generates more valid and correct patches, achieving a 377% improvement over the base LLM GPT-3.5 and a 268% improvement over GPT-4.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143848996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UI2HTML: utilizing LLM agents with chain of thought to convert UI into HTML code
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-17 DOI: 10.1007/s10515-025-00509-5
Dawei Yuan, Guocang Yang, Tao Zhang
{"title":"UI2HTML: utilizing LLM agents with chain of thought to convert UI into HTML code","authors":"Dawei Yuan,&nbsp;Guocang Yang,&nbsp;Tao Zhang","doi":"10.1007/s10515-025-00509-5","DOIUrl":"10.1007/s10515-025-00509-5","url":null,"abstract":"<div><p>The exponential growth of the internet has led to the creation of over 1.11 billion active websites, with approximately 252,000 new sites emerging daily. This burgeoning landscape underscores a pressing need for rapid and diverse website development, particularly to support advanced functionalities like Web3 interfaces and AI-generated content platforms. Traditional methods that manually convert visual designs into functional code are not only time-consuming but also error-prone, especially challenging for non-experts. In this paper, we introduce “UI2HTML” an innovative system that harnesses the capabilities of Web Real-Time Communication and Large Language Models (LLMs) to convert website layout designs into functional user interface (UI) code. The UI2HTML system employs a sophisticated divide-and-conquer approach, augmented by Chain of Thought reasoning, to enhance the processing and accurate analysis of UI designs. It efficiently captures real-time video and audio inputs from product managers via mobile devices, utilizing advanced image processing algorithms like OpenCV to extract and categorize UI elements. This rich data, complemented by audio descriptions of UI components, is processed by backend cloud services employing Multimodal Large Language Models (MLLMs). These AI agents interpret the multimodal data to generate requirement documents and initial software architecture drafts, effectively automating the translation of webpage designs into executable code. Our comprehensive evaluation demonstrates that UI2HTML significantly outperforms existing methods in terms of visual similarity and functional accuracy through extensive testing across real-world datasets and various MLLM configurations. By offering a robust solution for the automated generation of UI code from screenshots, UI2HTML sets a new benchmark in the field, particularly beneficial in today’s fast-evolving digital environment.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Navigating bug cold start with contextual multi-armed bandits: an enhanced approach to developer assignment in software bug repositories 利用上下文多臂匪帮浏览错误冷启动:软件错误库中开发人员分配的增强方法
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-16 DOI: 10.1007/s10515-025-00508-6
Neetu Singh, Sandeep Kumar Singh
{"title":"Navigating bug cold start with contextual multi-armed bandits: an enhanced approach to developer assignment in software bug repositories","authors":"Neetu Singh,&nbsp;Sandeep Kumar Singh","doi":"10.1007/s10515-025-00508-6","DOIUrl":"10.1007/s10515-025-00508-6","url":null,"abstract":"<div><p>Recommending the most suitable developer for new bugs poses a challenge to triagers in software bug repositories. Bugs vary in components, severity, priority, and other significant attributes, making it difficult to address them promptly. This difficulty is further compounded by the lack of background knowledge on new bugs, which impedes traditional recommender systems. In the absence of adequate information about either a developer or a bug, building, training, and testing a conventional machine-learning model becomes arduous. In such scenarios, one potential solution is employing a reinforcement-learning model. Often, triagers resort to simplistic approaches like selecting a random developer (explore strategy) or one who has been assigned frequently (exploit strategy). However, the research presented here demonstrates that these approaches based on multi-armed bandits (MAB) perform inadequately. To address this, we propose a novel improved bandit approach that utilizes contextual or side information to automatically recommend suitable developers for new or cold bugs. Experiments conducted on five publicly available open-source datasets have revealed that contextual MAB approaches outperformed simple MAB approaches. We have additionally evaluated the efficacy of two algorithms from Multi-Armed Bandit (MAB), as well as four algorithms from the Contextual-MAB algorithm. These algorithms were assessed based on four performance metrics, namely rewards, average rewards, regret, and average regret. The experimental results present a thorough framework for developer recommendation. The results indicate that all contextual-MAB approaches consistently outperform MAB approaches.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143835761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of unsupervised feature selection techniques on the performance and interpretation of defect prediction models
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-16 DOI: 10.1007/s10515-025-00510-y
Zhiqiang Li, Wenzhi Zhu, Hongyu Zhang, Yuantian Miao, Jie Ren
{"title":"The impact of unsupervised feature selection techniques on the performance and interpretation of defect prediction models","authors":"Zhiqiang Li,&nbsp;Wenzhi Zhu,&nbsp;Hongyu Zhang,&nbsp;Yuantian Miao,&nbsp;Jie Ren","doi":"10.1007/s10515-025-00510-y","DOIUrl":"10.1007/s10515-025-00510-y","url":null,"abstract":"<div><p>The performance and interpretation of a defect prediction model depend on the software metrics utilized in its construction. Feature selection techniques can enhance model performance and interpretation by effectively removing redundant, correlated, and irrelevant metrics from defect datasets. Previous empirical studies have scrutinized the impact of feature selection techniques on the performance and interpretation of defect prediction models. However, most feature selection techniques examined in these studies are primarily supervised. In particular, the impact of unsupervised feature selection (UFS) techniques on defect prediction remains unknown and needs to be explored extensively. To address this gap, we systematically apply 21 UFS techniques to evaluate their impact on the performance and interpretation of unsupervised defect prediction models in binary classification and effort-aware ranking scenarios. Extensive experiments are conducted on the 28 versions from 8 projects using 4 unsupervised models. We observe that: (1) 10–100% of the selected metrics are inconsistent between each pair of UFS techniques. (2) 29–100% of the selected metrics are inconsistent among different software modules. (3) For unsupervised defect prediction models, some UFS techniques (e.g., AutoSpearman, LS, and FMIUFS) exhibit the ability to effectively reduce the number of metrics while maintaining or even improving model performance. (4) UFS techniques alter the ranking of the top 3 groups of metrics in defect models, affecting the interpretation of these models. Based on these findings, we recommend that software practitioners utilize UFS techniques for unsupervised defect prediction. However, caution should be exercised when deriving insights and interpretations from defect prediction models.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143840329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIFT: enhance the performance of vulnerability detection by incorporating structural knowledge and multi-task learning
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-11 DOI: 10.1007/s10515-025-00507-7
Liping Wang, Guilong Lu, Xiang Chen, Xiaofeng Dai, Jianlin Qiu
{"title":"SIFT: enhance the performance of vulnerability detection by incorporating structural knowledge and multi-task learning","authors":"Liping Wang,&nbsp;Guilong Lu,&nbsp;Xiang Chen,&nbsp;Xiaofeng Dai,&nbsp;Jianlin Qiu","doi":"10.1007/s10515-025-00507-7","DOIUrl":"10.1007/s10515-025-00507-7","url":null,"abstract":"<div><p>Software vulnerabilities pose significant risks to software systems, leading to security breaches, data loss, operational disruptions, and substantial financial damage. Therefore, accurately detecting these vulnerabilities is of paramount importance. In recent years, pre-trained language models (PLMs) have demonstrated powerful capabilities in code representation and understanding, emerging as a promising method for vulnerability detection. However, integrating code structure knowledge while fine-tuning PLMs remains a significant challenge. To alleviate this limitation, we propose a novel vulnerability detection approach called SIFT. SIFT extracts the code property graph (CPG) to serve as the source of graph structural information. It constructs a code structure matrix from this information and measures the difference between the code structure matrix and the attention matrix using Sinkhorn Divergence to obtain the structural knowledge loss. This structural knowledge loss is then used alongside the cross-entropy loss for vulnerability detection in a multi-task learning framework to enhance overall detection performance. To evaluate the effectiveness of SIFT, we conducted experiments on three vulnerability detection datasets: FFmpeg+Qemu, Chrome+Debian, and Big-Vul. The results demonstrate that SIFT outperforms nine state-of-the-art vulnerability detection baselines, achieving performance improvements of 1.74%, 10.19%, and 2.87% in terms of F1 score, respectively. Our study shows the effectiveness of incorporating structural knowledge and multi-task learning in enhancing the performance of PLMs for vulnerability detection.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143818150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthetic versus real: an analysis of critical scenarios for autonomous vehicle testing
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-09 DOI: 10.1007/s10515-025-00499-4
Qunying Song, Avner Bensoussan, Mohammad Reza Mousavi
{"title":"Synthetic versus real: an analysis of critical scenarios for autonomous vehicle testing","authors":"Qunying Song,&nbsp;Avner Bensoussan,&nbsp;Mohammad Reza Mousavi","doi":"10.1007/s10515-025-00499-4","DOIUrl":"10.1007/s10515-025-00499-4","url":null,"abstract":"<div><p>With the emergence of autonomous vehicles comes the requirement of adequate and rigorous testing, particularly in critical scenarios that  are both challenging and potentially hazardous. Generating synthetic simulation-based critical scenarios for testing autonomous vehicles has therefore received considerable interest, yet it is unclear how such scenarios relate to the actual crash or near-crash scenarios  in the real world. Consequently, their realism is unknown. In this paper, we define realism as the degree of similarity of synthetic critical scenarios to real-world critical scenarios. We propose a methodology to measure realism using two metrics, namely attribute distribution and Euclidean distance. The methodology extracts various attributes from synthetic and realistic critical scenario datasets and performs a set of statistical tests to compare their distributions and distances. As a proof of concept for our methodology, we compare synthetic collision scenarios from DeepScenario against realistic autonomous vehicle collisions collected by the Department of Motor Vehicles in California, to analyse how well DeepScenario synthetic collision scenarios are aligned with real autonomous vehicle collisions recorded in California. We focus on five key attributes that are extractable from both datasets, and analyse the attribution distribution and distance between scenarios in the two datasets. Further, we derive recommendations to improve the realism of synthetic scenarios based on our analysis. Our study of realism provides a framework that can be replicated and extended for other dataset both concerning real-world and synthetically-generated scenarios.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00499-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pipe-DBT: enhancing dynamic binary translation simulators to support pipeline-level simulation
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-05 DOI: 10.1007/s10515-025-00506-8
Tiancheng Tang, Yi Man, Xinbing Zhou, Duqing Wang
{"title":"Pipe-DBT: enhancing dynamic binary translation simulators to support pipeline-level simulation","authors":"Tiancheng Tang,&nbsp;Yi Man,&nbsp;Xinbing Zhou,&nbsp;Duqing Wang","doi":"10.1007/s10515-025-00506-8","DOIUrl":"10.1007/s10515-025-00506-8","url":null,"abstract":"<div><p>In response to the lack of pipeline behavior modeling in Instruction-Set Simulators (ISS) and the performance limitations of Cycle-Accurate Simulators (CAS), this paper proposes Pipe-DBT, a pipeline simulation framework based on Dynamic Binary Translation (DBT). This method achieves a balance between accuracy and efficiency through two key techniques: (1) the design of a pipeline state descriptor called Pipsdep, which abstracts data hazards and resource contentions in the form of formal rules about resource occupancy and read/write behaviors, thereby avoiding low-level hardware details; (2) the introduction of a coroutine-based instruction execution flow partitioning mechanism that employs dynamic suspension/resumption to realize cycle-accurate scheduling in multi-stage pipelines. Implemented on QEMU, Pipe-DBT supports variable-length pipelines, a Very Long Instruction Word (VLIW) architecture with four-issue capability, and pipeline forwarding. Under typical DSP workloads, it achieves a simulation speed of 400–1100 KIPS, representing a 2.3<span>(times)</span> improvement over Gem5 in cycle-accurate mode. Experimental results show that only modular extensions to the host DBT framework are required to accommodate heterogeneous pipeline microarchitectures, thereby providing a high-throughput simulation infrastructure for processor design. To the best of our knowledge, this is the first pipeline-level simulation model implemented on a DBT simulator.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00506-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143777994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An empirical study on the code naturalness modeling capability for LLMs in automated patch correctness assessment
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-04-02 DOI: 10.1007/s10515-025-00502-y
Yuning Li, Wenkang Zhong, Zongwen Shen, Chuanyi Li, Xiang Chen, Jidong Ge, Bin Luo
{"title":"An empirical study on the code naturalness modeling capability for LLMs in automated patch correctness assessment","authors":"Yuning Li,&nbsp;Wenkang Zhong,&nbsp;Zongwen Shen,&nbsp;Chuanyi Li,&nbsp;Xiang Chen,&nbsp;Jidong Ge,&nbsp;Bin Luo","doi":"10.1007/s10515-025-00502-y","DOIUrl":"10.1007/s10515-025-00502-y","url":null,"abstract":"<div><p>Just like natural language, code can exhibit naturalness. This property manifests in highly repetitive patterns within specific contexts. Code naturalness can be captured by language models and then applied to various software engineering tasks (such as fault localization and program repair). Recently, Large Language Models (LLMs) based on Transformers have become advantageous tools for modeling code naturalness. However, existing work lacks systematic studies on the code naturalness modeling capability for LLMs. To bridge this gap, this paper explores the code naturalness modeling capability for LLMs, starting with the task of automated patch correctness assessment. Specifically, we investigate whether LLMs with different architectures and scales, under varying context window sizes, (1) can identify buggy code from common code based on naturalness and consider fixed code more natural than buggy code, and (2) can distinguish different degrees of repairs (i.e., complete repairs and incomplete repairs) from automated tools. Then, we propose metrics to assess the above two capabilities of the models. Experimental results indicate that models with different architectures and scales have the code naturalness modeling capability, even models not specifically pre-trained on code. Additionally, smaller models do not necessarily exhibit weaker modeling capability compared to larger models. We also find more contextual information only provides limited benefits. Based on experimental findings, we select the best performing model that has 220 M parameters to develop an Entropy-based Automated Patch Correctness Assessment (E-APCA) approach by calculating code naturalness. On the large-scale dataset PraPatch, E-APCA surpasses traditional methods by over 20% across various evaluation metrics. Compared to the latest APCA method Entropy-delta based on a 6.7B LLM, E-APCA achieves a 17.32% higher correct patch recall and a 6.83% higher F1 score, while the reasoning time is less than 7% of that required by Entropy-delta.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143761732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ladle: a method for unsupervised anomaly detection across log types
IF 2 2区 计算机科学
Automated Software Engineering Pub Date : 2025-03-24 DOI: 10.1007/s10515-025-00504-w
Juha Mylläri, Tatu Aalto, Jukka K. Nurminen
{"title":"Ladle: a method for unsupervised anomaly detection across log types","authors":"Juha Mylläri,&nbsp;Tatu Aalto,&nbsp;Jukka K. Nurminen","doi":"10.1007/s10515-025-00504-w","DOIUrl":"10.1007/s10515-025-00504-w","url":null,"abstract":"<div><p>Log files can help detect and diagnose erroneous software behaviour, but their utility is limited by the ability of users and developers to sift through large amounts of text. Unsupervised machine learning tools have been developed to automatically find anomalies in logs, but they are usually not designed for situations where a large number of log streams or log files, each with its own characteristics, need to be analyzed and their anomaly scores compared. We propose Ladle, an accurate unsupervised anomaly detection and localization method that can simultaneously learn the characteristics of hundreds of log types and determine which log entries are the most anomalous across these log types. Ladle uses a sentence transformer (a large language model) to embed short overlapping segments of log files and compares new, potentially anomalous, log segments against a collection of reference data. The result of the comparison is re-centered by subtracting a baseline score indicating how much variation tends to occur in each log type, making anomaly scores comparable across log types. Ladle is designed to adapt to data drift and is updated by adding new reference data without the need to retrain the sentence transformer. We demonstrate the accuracy of Ladle on a real-world dataset consisting of logs produced by an endpoint protection platform test suite. We also compare Ladle’s performance on the dataset to that of a state-of-the-art method for single-log anomaly detection, showing that the latter is inadequate for the multi-log task.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00504-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信