{"title":"Variational Prefix Tuning for diverse and accurate code summarization using pre-trained language models","authors":"Junda Zhao, Yuliang Song, Eldan Cohen","doi":"10.1016/j.jss.2025.112493","DOIUrl":"10.1016/j.jss.2025.112493","url":null,"abstract":"<div><div>Recent advancements in source code summarization have leveraged transformer-based pre-trained models, including Large Language Models of Code (LLMCs), to automate and improve the generation of code summaries. However, existing methods often focus on generating a single high-quality summary for a given source code, neglecting scenarios where the generated summary might be inadequate and alternative options are needed. In this paper, we introduce Variational Prefix Tuning (VPT), a novel approach that enhances pre-trained models’ ability to generate diverse yet accurate sets of summaries, allowing the user to choose the most suitable one for the given source code. Our method integrates a Conditional Variational Autoencoder (CVAE) framework as a modular component into pre-trained models, enabling us to model the distribution of observed target summaries and sample continuous embeddings to be used as prefixes to steer the generation of diverse outputs during decoding. Importantly, we construct our method in a parameter-efficient manner, eliminating the need for expensive model retraining, especially when using LLMCs. Furthermore, we employ a bi-criteria reranking method to select a subset of generated summaries, optimizing both the diversity and the accuracy of the options presented to users. We present extensive experimental evaluations using widely used datasets and current state-of-the-art pre-trained code summarization models to demonstrate the effectiveness of our approach and its adaptability across models.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"229 ","pages":"Article 112493"},"PeriodicalIF":3.7,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144130877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tab (Tianyi) Zhang , Claire Taylor , Bart Coppens , Waleed Mebane , Christian Collberg , Bjorn De Sutter
{"title":"reAnalyst: Scalable annotation of reverse engineering activities","authors":"Tab (Tianyi) Zhang , Claire Taylor , Bart Coppens , Waleed Mebane , Christian Collberg , Bjorn De Sutter","doi":"10.1016/j.jss.2025.112492","DOIUrl":"10.1016/j.jss.2025.112492","url":null,"abstract":"<div><div>This paper introduces reAnalyst, a framework designed to facilitate the study of reverse engineering (RE) practices through the semi-automated annotation of RE activities across various RE tools. By integrating tool-agnostic data collection of screenshots, keystrokes, active processes, and other types of data during RE experiments with semi-automated data analysis and generation of annotations, reAnalyst aims to overcome the limitations of traditional RE studies that rely heavily on manual data collection and subjective analysis. The framework enables more efficient data analysis, which will in turn allow researchers to explore the effectiveness of protection techniques and strategies used by reverse engineers more comprehensively and efficiently. Experimental evaluations validate the framework’s capability to identify RE activities from a diverse range of screenshots with varied complexities. Observations on past experiments with our framework as well as a survey among reverse engineers provide further evidence of the acceptability and practicality of our approach.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"229 ","pages":"Article 112492"},"PeriodicalIF":3.7,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144070114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dear researchers: Turning industry into a laboratory — The UnICo experience","authors":"Andrea Capiluppi","doi":"10.1016/j.jss.2025.112495","DOIUrl":"10.1016/j.jss.2025.112495","url":null,"abstract":"<div><div><span><span><sup>1</sup></span></span> Academic researchers are also educators, and rightly so. Who better to teach than those at the forefront of their fields? Yet, dear researchers, there is a significant issue with this model: a disconnect from industrial realities. Industry often looks on with disbelief as you implement cutting-edge research using tools like the Eclipse IDE, an environment they abandoned years ago. To address this gap, we propose fostering academia-industry collaboration for capstone projects. By adopting an “Industry-as-a-Lab” approach, where real-world challenges guide research and education, you can equip students with industry-relevant skills and perhaps even gain new insights yourself. Using ASML as a case study, I illustrate how this model fosters meaningful learning, impactful research, and innovative solutions, effectively bridging the divide between academia and industry.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"228 ","pages":"Article 112495"},"PeriodicalIF":3.7,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deyun Lyu , Yi Li , Zhenya Zhang , Paolo Arcaini , Xiao-Yi Zhang , Fuyuki Ishikawa , Jianjun Zhao
{"title":"Fault localization of AI-enabled cyber–physical systems by exploiting temporal neuron activation","authors":"Deyun Lyu , Yi Li , Zhenya Zhang , Paolo Arcaini , Xiao-Yi Zhang , Fuyuki Ishikawa , Jianjun Zhao","doi":"10.1016/j.jss.2025.112475","DOIUrl":"10.1016/j.jss.2025.112475","url":null,"abstract":"<div><div>Modern <em>cyber–physical systems (CPS)</em> are evolving to integrate <em>deep neural networks (DNNs)</em> as controllers, leading to the emergence of <em>AI-enabled CPSs</em>. An inadequately trained DNN controller may produce incorrect control actions, exposing the system to safety risks. Therefore, it is crucial to localize the faulty neurons of the DNN controller responsible for the wrong decisions. However, since an unsafe system behavior typically arises from a sequence of control actions, establishing a connection between unsafe behaviors and faulty neurons is challenging. To address this problem, we propose <span>Tactical</span> that localizes faults in an AI-enabled CPS by exploiting <em>temporal neuron activation criteria</em> that capture temporal aspects of the DNN controller inferences. Specifically, based on testing results, for each neuron, <span>Tactical</span> constructs a <em>spectrum</em>, which considers the specification satisfaction and the evolution of the activation status of the neuron during the system execution. Then, starting from the spectra of all the neurons, <span>Tactical</span> applies suspiciousness metrics to compute a suspiciousness score for each neuron, from which the most suspicious ones are selected. We assess <span>Tactical</span> configured with eight <em>temporal neuron activation criteria</em>, on 3504 faulty AI-enabled CPS benchmarks spanning over different domains. The results show the effectiveness of <span>Tactical</span> w.r.t. a baseline approach.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"229 ","pages":"Article 112475"},"PeriodicalIF":3.7,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
João Eduardo Montandon , Luciana Lourdes Silva , Cristiano Politowski , Daniel Prates , Arthur de Brito Bonifácio , Ghizlane El Boussaidi
{"title":"Unboxing Default Argument Breaking Changes in 1 + 2 data science libraries","authors":"João Eduardo Montandon , Luciana Lourdes Silva , Cristiano Politowski , Daniel Prates , Arthur de Brito Bonifácio , Ghizlane El Boussaidi","doi":"10.1016/j.jss.2025.112460","DOIUrl":"10.1016/j.jss.2025.112460","url":null,"abstract":"<div><div>Data Science (DS) has become a cornerstone for modern software, enabling data-driven decisions to improve companies services. Following modern software development practices, data scientists use third-party libraries to support their tasks. As the APIs provided by these tools often require an extensive list of arguments to be set up, data scientists rely on default values to simplify their usage. It turns out that these default values can change over time, leading to a specific type of breaking change, defined as Default Argument Breaking Change (DABC). This work reveals 93 DABCs in three Python libraries frequently used in Data Science tasks—Scikit Learn, NumPy, and Pandas—studying their potential impact on more than 500K client applications. We find out that the occurrence of DABCs varies significantly depending on the library; 35% of Scikit Learn clients are affected, while only 0.13% of NumPy clients are impacted. The main reason for introducing DABCs is to enhance API maintainability, but they often change the function’s behavior. We discuss the importance of managing DABCs in third-party DS libraries and provide insights for developers to mitigate the potential impact of these changes in their applications.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"229 ","pages":"Article 112460"},"PeriodicalIF":3.7,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yayi Zou , Yixiang Zhang , Guanghao Zhao , Yueming Wu , Shuhao Shen , Cai Fu
{"title":"BinCoFer: Three-stage purification for effective C/C++ binary third-party library detection","authors":"Yayi Zou , Yixiang Zhang , Guanghao Zhao , Yueming Wu , Shuhao Shen , Cai Fu","doi":"10.1016/j.jss.2025.112480","DOIUrl":"10.1016/j.jss.2025.112480","url":null,"abstract":"<div><div>Third-party libraries (TPL) are becoming increasingly popular to achieve efficient and concise software development. However, unregulated use of TPL will introduce legal and security issues in software development. Consequently, some studies have attempted to detect the reuse of TPLs in target programs by constructing a feature repository. Most of the works require access to the source code of TPLs, while the others suffer from redundancy in the repository, low detection efficiency, and difficulties in detecting partially referenced third-party libraries.</div><div>Therefore, we introduce BinCoFer, a tool designed for detecting TPLs reused in binary programs. We leverage the work of binary code similarity detection(BCSD) to extract binary-format TPL features, making it suitable for scenarios where the source code of TPLs is inaccessible. BinCoFer employs a novel three-stage purification strategy to mitigate feature repository redundancy by highlighting core functions and extracting function-level features, making it applicable to scenarios of partial reuse of TPLs. We have observed that directly using similarity threshold to determine the reuse between two binary functions is inaccurate, a problem that previous work has not addressed. Thus we design a method that uses weight to aggregate the similarity between functions in the target binary and core functions to ultimately judge the reuse situation with high frequency. To examine the ability of <em>BinCoFer</em>, we compiled a dataset on ArchLinux and conduct comparative experiments on it with other four most related works (<em>i.e., ModX</em>, <em>B2SFinder</em>, <em>LibAM</em> and <em>BinaryAI</em>). Through the experimental results, we find that <em>BinCoFer</em> outperforms them by over 20.0% in precision and 7.0% in F1. As the data volume increases, we observe the precision of BinCoFer tends to be stable and high. Moreover, <em>BinCoFer</em> greatly accelerates TPL detection efficiency which reduces the time cost of <em>ModX</em> by up to 99.7%.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"229 ","pages":"Article 112480"},"PeriodicalIF":3.7,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143946673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Text–image fusion template for large language model assisted crowdsourcing test aggregation","authors":"Yunfeng Zhu, Shengcheng Yu, Zhaowei Zong, Yue Wang, Yuan Zhao, Zhenyu Chen","doi":"10.1016/j.jss.2025.112478","DOIUrl":"10.1016/j.jss.2025.112478","url":null,"abstract":"<div><div>Mobile crowdsourced testing leverages a varied group to enhance software quality through screenshots and text feedback. Examining the multitude of reports is tedious but crucial, often necessitating a combined analysis of both visual and textual information. However, professionals employ detailed judgment beyond mere similarity, which poses a challenge given the limited textual data and abundance of images in the reports.</div><div>We introduce a framework that guides large language models to handle missing data and inconsistencies in crowdsourced reports by using a triplet template <span><math><mrow><mo>〈</mo></mrow></math></span> Scene, Operation, Defect <span><math><mrow><mo>〉</mo></mrow></math></span> for bug identification. The framework leverages the element independence of the triplet for clustering ensemble and designs an algorithm to generate potential operation paths, aggregating reports within the cluster through constructed graphs. Our method, validated on 5115 reports, employs a clustering ensemble and graph aggregation, improving the clustering V-measure to 0.722. It also reduces the annotation time per report by 39. 3%, thereby improving the quality of the tagging. Source code available at <span><span>https://github.com/Boomnana/Text-Image-Fusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"228 ","pages":"Article 112478"},"PeriodicalIF":3.7,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143931497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient adaptive test case selection for DNNs robustness enhancement","authors":"Zhiyi Zhang , Huanze Meng , Yuchen Ding , Shuxian Chen , Yongming Yao","doi":"10.1016/j.jss.2025.112451","DOIUrl":"10.1016/j.jss.2025.112451","url":null,"abstract":"<div><div>Deep neural networks (DNNs) have been widely used in various fields, and testing for DNN-based software has become increasingly important. To discover potential faults in DNNs, a large number of test cases and their corresponding labels are required. However, labeling so many test cases consumes enormous costs. Although there have been many test case selection techniques for DNN models, these techniques still have problems such as high overhead, low efficiency, and poor diversity. To address this problem, this paper proposes an efficient adaptive test case selection method based on the principle of uniform distribution of test cases called EATS. Based on the idea of adaptive testing, EATS combines the uncertainty of the model and the diversity of faults to calculate the distance of test cases and sort them, then gives priority to test cases with a higher probability of causing faults. We conduct experiments on four popular datasets and four representative DNN models. Experiment results show that, compared with the existing eight methods, EATS performs better in uniformity of test case distribution, diversity of errors found, model optimization, and optimization efficiency.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"229 ","pages":"Article 112451"},"PeriodicalIF":3.7,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Code beauty is in the eye of the beholder: Exploring the relation between code beauty and quality","authors":"Theodoros Maikantis , Ilianna Natsiou , Christina Volioti , Elvira-Maria Arvanitou , Apostolos Ampatzoglou , Nikolaos Mittas , Alexander Chatzigeorgiou , Stelios Xinogalos","doi":"10.1016/j.jss.2025.112494","DOIUrl":"10.1016/j.jss.2025.112494","url":null,"abstract":"<div><div>Software artifacts and source code are often viewed as pure technical constructs aiming primarily at delivering specific functionality to the end users. However, almost each line of a computer program is the result of software engineer’s craftsmanship and thus reflects their skills and capabilities, but also their aesthetic view of how code should be written. Additionally, by nature, the code is not an artifact that is managed by a single person: the code is peer-reviewed, in some cases programmed in pairs, or maintained by different people. In this respect, the first impression for the quality of a code is usually a matter of “<em>reading</em>” the “<em>beauty</em>” of the code and then diving into the details of the actual implementation. This “<em>first-look</em>” impression can psychologically bias the software engineers, either positively or negatively and affect their evaluation. In this article we propose a novel code beauty model (accompanied with metrics) and empirically explore: (a) if different software engineers perceive code beauty in the same way; (b) if the proposed code beauty metrics are correlated to the perceived code beauty by individual software engineers; and (c) if code beauty metrics are correlated to software maintainability. The results of the study suggest: (a) that code beauty is highly subjective and different software engineers perceive a code chunk as beautiful or not in an inconsistent way; (b) that some code beauty metrics can be considered as correlated to maintainability; and therefore, the “<em>first-look</em>” impression might to some extent be representative of the quality of the reviewed code chunk.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"229 ","pages":"Article 112494"},"PeriodicalIF":3.7,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143946674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruijie Cai , Zhaowei Zhang , Xiaoya Zhu , Yongguang Zhang , Xiaokang Yin , Shengli Liu
{"title":"Coding style matters: Scalable and efficient identification of memory management functions in monolithic firmware","authors":"Ruijie Cai , Zhaowei Zhang , Xiaoya Zhu , Yongguang Zhang , Xiaokang Yin , Shengli Liu","doi":"10.1016/j.jss.2025.112472","DOIUrl":"10.1016/j.jss.2025.112472","url":null,"abstract":"<div><div>The occurrence of memory corruption vulnerabilities is often closely associated with improper use or implementation of memory management functions. Monolithic firmware typically uses custom memory management functions and lacks information such as function names, which poses significant challenges for vulnerability detection. Therefore, it is crucial for the identification of memory management functions. Existing methods are rendered ineffective due to the absence of metadata, and the diversity in implementation across different firmware images further complicates the identification process. To address the above problem, we introduce MemIdent, a new method leveraging the coding style inherent in identifying memory management functions. MemIdent is engineered to be scalable and efficient, capable of discerning consistent call features across various compiler optimizations and instruction architectures. It leverages three key observations derived from an in-depth analysis of monolithic firmware: the regularity in memory allocation calls, the co-occurrence of allocation and deallocation functions, and the statistical prominence of these features. MemIdent extracts features of call site such as function parameter types and return values using data flow analysis, which are then analyzed through statistical patterns to identify memory allocation and deallocation functions. We evaluate MemIdent’s performance using 44 firmware images covering 6 vendors (i.e., Tenda, Cisco, SonicWall, D-Link, TP-Link, and Comtech) across 3 architectures (MIPS, ARM, and PPC). The experimental results demonstrate that MemIdent has higher accuracy, greater efficiency, and better generality than state-of-the-art (SOTA) approaches, including Heapster, IDA Lumina, and MLM, which offers a significant advancement in memory management function identification methods for monolithic firmware.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"228 ","pages":"Article 112472"},"PeriodicalIF":3.7,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}