{"title":"On-the-fly unfolding with optimal exploration for linear temporal logic model checking of concurrent software and systems","authors":"Shuo Li, Li’ao Zheng, Ru Yang, Zhijun Ding","doi":"10.1007/s10515-025-00511-x","DOIUrl":"10.1007/s10515-025-00511-x","url":null,"abstract":"<div><p>Linear temporal logic (LTL) model checking faces a significant challenge known as the state-explosion problem. The on-the-fly method is a solution that constructs and checks the state space simultaneously, avoiding generating all states in advance. But it is not effective for concurrent interleaving. Unfolding based on Petri nets is a succinct structure covering all states that can mitigate this problem caused by concurrency. Many state-of-the-art methods optimally explore a complete unfolding structure using a tree-like structure. However, it is difficult to apply such a tree-like structure directly to the traditional on-the-fly method of LTL. At the same time, constructing a complete unfolding structure in advance and then checking LTL is also wasteful. Thus, the existing optimal exploration methods are not applicable to the on-the-fly unfolding. To solve these challenges, we propose an LTL model-checking method called on-the-fly unfolding with optimal exploration. This method is based on program dependence net (PDNet) proposed in the previous work. Firstly, we define conflict transitions of PDNet and an exploration tree with a novel notion of delayed transitions, which differs from the existing tree-like structure. The tree improves the on-the-fly unfolding by exploring each partial-order run only once and avoiding enumerating all possible combinations. Then, we propose an on-the-fly unfolding algorithm that simultaneously constructs the exploration tree and generates the unfolding structure while checking LTL. We implement a tool for verifying LTL properties of concurrent programs. It also improves traditional unfolding generations and performs better than <i>SPIN</i> and <i>DiVine</i> on the used benchmarks. The core contribution of this paper is that we propose an on-the-fly unfolding with an optimal exploration method for LTL. It avoids the complete enumeration of concurrent combinations from traditional unfolding generation.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiayi Wang, Ping Yu, Yi Qin, Yanyan Jiang, Yuan Yao, Xiaoxing Ma
{"title":"NexuSym: Marrying symbolic path finders with large language models","authors":"Jiayi Wang, Ping Yu, Yi Qin, Yanyan Jiang, Yuan Yao, Xiaoxing Ma","doi":"10.1007/s10515-025-00529-1","DOIUrl":"10.1007/s10515-025-00529-1","url":null,"abstract":"<div><p>Symbolic execution is a powerful technique for automated test case generation, ensuring comprehensive coverage of potential scenarios. However, it often struggles with complex, deep paths due to path explosion. Conversely, large language models (LLMs) utilize vast training data to generate test cases that can uncover intricate program behaviors that symbolic execution might miss. Despite their complementary strengths, integrating the systematic nature of symbolic execution with the creative capabilities of LLMs presents a significant challenge. We introduce <span>NexuSym</span>, an innovative tool that integrates symbolic execution with LLMs to facilitate the automatic generation of test cases. To effectively bridge the gap between these two approaches, we have developed a test case reducer, which normalizes the LLM-generated test cases to make them compatible with symbolic execution. Additionally, we propose a search space summarizer, which abstracts and condenses the search space explored by symbolic execution, enabling the LLM to focus on the most promising areas for further exploration. We instantiated <span>NexuSym</span> on KLEE and ChatGPT. Our evaluation of <span>NexuSym</span> involved 99 coreutils programs and 9 large GNU programs. The experimental results demonstrate that <span>NexuSym</span> significantly enhances program test coverage, with improvements of up to 20% in certain cases. Furthermore, we conducted an analysis of the monetary costs associated with using the LLM API, revealing that <span>NexuSym</span> is a highly cost-effective solution.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What information contributes to log-based anomaly detection? Insights from a configurable transformer-based approach","authors":"Xingfang Wu, Heng Li, Foutse Khomh","doi":"10.1007/s10515-025-00527-3","DOIUrl":"10.1007/s10515-025-00527-3","url":null,"abstract":"<div><p>Log data are generated from logging statements in the source code, providing insights into the execution processes of software applications and systems. State-of-the-art log-based anomaly detection approaches typically leverage deep learning models to capture the semantic or sequential information in the log data and detect anomalous runtime behaviors. However, the impacts of these different types of information are not clear. In addition, most existing approaches ignore the timestamps in log data, which can potentially provide fine-grained sequential and temporal information. In this work, we propose a configurable Transformer-based anomaly detection model that can capture the semantic, sequential, and temporal information in the log data and allows us to configure the different types of information as the model’s features. Additionally, we train and evaluate the proposed model using log sequences of different lengths, thus overcoming the constraint of existing methods that rely on fixed-length or time-windowed log sequences as inputs. With the proposed model, we conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information (i.e., sequential, temporal, semantic information) in anomaly detection. The model can attain competitive and consistently stable performance compared to the baselines when presented with log sequences of varying lengths. The results indicate that the event occurrence information plays a key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection on the studied public datasets. On the other hand, the findings also reveal the simplicity of the studied public datasets and highlight the importance of constructing new datasets that contain different types of anomalies to better evaluate the performance of anomaly detection models.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised software vulnerability assessment via code lexical and structural information fusion","authors":"Wenlong Pei, Yilin Huang, Xiang Chen, Guilong Lu, Yong Liu, Chao Ni","doi":"10.1007/s10515-025-00526-4","DOIUrl":"10.1007/s10515-025-00526-4","url":null,"abstract":"<div><p>In </p><p>recent years, data-driven approaches have become popular for software vulnerability assessment (SVA). However, these approaches need a large amount of labeled SVA data to construct effective SVA models. This process demands security expertise for accurate labeling, incurring significant costs and introducing potential errors. Therefore, collecting the training datasets for SVA can be a challenging task. To effectively alleviate the SVA data labeling cost, we propose an approach SURF, which makes full use of a limited amount of labeled SVA data combined with a large amount of unlabeled SVA data to train the SVA model via semi-supervised learning. Furthermore, SURF incorporates lexical information (i.e., treat the code as plain text) and structural information (i.e., treat the code as the code property graph) as bimodal inputs for the SVA model training, which can further improve the performance of SURF. Through extensive experiments, we evaluated the effectiveness of SURF on a dataset that contains C/C++ vulnerable functions from real-world software projects. The results show that only by labeling 30% of the SVA data, SURF can reach or even exceed the performance of state-of-the-art SVA baselines (such as DeepCVA and Func), even if these supervised baselines use 100% of the labeled SVA data. Furthermore, SURF can also exceed the performance of the state-of-the-art Positive-unlabeled learning baseline PILOT when both are trained on 30% of the labeled SVA data.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software testing for extended reality applications: a systematic mapping study","authors":"Ruizhen Gu, José Miguel Rojas, Donghwan Shin","doi":"10.1007/s10515-025-00523-7","DOIUrl":"10.1007/s10515-025-00523-7","url":null,"abstract":"<div><p>Extended Reality (XR) is an emerging technology spanning diverse application domains and offering immersive user experiences. However, its unique characteristics, such as six degrees of freedom interactions, present significant testing challenges distinct from traditional 2D GUI applications, demanding novel testing techniques to build high-quality XR applications. This paper presents the first systematic mapping study on software testing for XR applications. We selected 34 studies focusing on techniques and empirical approaches in XR software testing for detailed examination. The studies are classified and reviewed to address the current research landscape, test facets, and evaluation methodologies in the XR testing domain. Additionally, we provide a repository summarising the mapping study, including datasets and tools referenced in the selected studies, to support future research and practical applications. Our study highlights open challenges in XR testing and proposes actionable future research directions to address the gaps and advance the field of XR software testing.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00523-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HGNNLink: recovering requirements-code traceability links with text and dependency-aware heterogeneous graph neural networks","authors":"Bangchao Wang, Zhiyuan Zou, Xuanxuan Liang, Huan Jin, Peng Liang","doi":"10.1007/s10515-025-00528-2","DOIUrl":"10.1007/s10515-025-00528-2","url":null,"abstract":"<div><p>Manually recovering traceability links between requirements and code artifacts often consumes substantial human resources. To address this, researchers have proposed automated methods based on textual similarity between requirements and code artifacts, such as information retrieval (IR) and pre-trained models, to determine whether traceability links exist between requirements and code artifacts. However, in the same system, developers often follow similar naming conventions and repeatedly use the same frameworks and template code, resulting in high textual similarity between code artifacts that are functionally unrelated. This makes it difficult to accurately identify the corresponding code artifacts for requirements artifacts solely based on textual similarity. Therefore, it is necessary to leverage the dependency relationships between code artifacts to assist in the requirements-code traceability link recovery process. Existing methods often treat dependency relationships as a post-processing step to refine textual similarity, overlooking the importance of textual similarity and dependency relationships in generating requirements-code traceability links. To address these limitations, we proposed Heterogeneous Graph Neural Network Link (HGNNLink), a requirements traceability approach that uses vectors generated by pre-trained models as node features and considers IR similarity and dependency relationships as edge features. By employing a heterogeneous graph neural network, HGNNLink aggregates and dynamically evaluates the impact of textual similarity and code dependencies on link generation. The experimental results show that HGNNLink improves the average F1 score by 13.36% compared to the current state-of-the-art (SOTA) method GA-XWCoDe in a dataset collected from ten open source software (OSS) projects. HGNNLink can extend IR methods by using high similarity candidate links as edges, and the extended HGNNLink achieves a 2.48% improvement in F1 compared to the original IR method after threshold parameter configuration using a genetic algorithm.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145171197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manar Mazkatli, David Monschein, Martin Armbruster, Robert Heinrich, Anne Koziolek
{"title":"Continuous integration of architectural performance models with parametric dependencies – the CIPM approach","authors":"Manar Mazkatli, David Monschein, Martin Armbruster, Robert Heinrich, Anne Koziolek","doi":"10.1007/s10515-025-00521-9","DOIUrl":"10.1007/s10515-025-00521-9","url":null,"abstract":"<p>The explicit consideration of the software architecture supports system evolution and efficient quality assurance. In particular, Architecture-based Performance Prediction (AbPP) assesses the performance for future scenarios (e.g., alternative workload, design, deployment) without expensive measurements for all such alternatives. However, accurate AbPP requires an up-to-date architectural Performance Model (aPM) that is parameterized over factors impacting the performance (e.g., input data characteristics). Especially in agile development, keeping such a parametric aPM consistent with software artifacts is challenging due to frequent evolutionary, adaptive, and usage-related changes. Existing approaches do not address the impact of all aforementioned changes. Moreover, the extraction of a complete aPM after each impacting change causes unnecessary monitoring overhead and may overwrite previous manual adjustments. In this article, we present the Continuous Integration of architectural Performance Model (CIPM) approach, which automatically updates a parametric aPM after each evolutionary, adaptive, or usage change. To reduce the monitoring overhead, CIPM only calibrates the affected performance parameters (e.g., resource demand) using adaptive monitoring. Moreover, a self-validation process in CIPM validates the accuracy, manages the monitoring to reduce overhead, and recalibrates inaccurate parts. Consequently, CIPM will automatically keep the aPM up-to-date throughout the development and operation, which enables AbPP for a proactive identification of upcoming performance problems and for evaluating alternatives at low costs. We evaluate the applicability of CIPM in terms of accuracy, monitoring overhead, and scalability using six cases (four Java-based open source applications and two industrial Lua-based sensor applications). Regarding accuracy, we observed that CIPM correctly keeps an aPM up-to-date and estimates performance parameters well so that it supports accurate performance predictions. Regarding the monitoring overhead in our experiments, CIPM’s adaptive instrumentation demonstrated a significant reduction in the number of required instrumentation probes, ranging from 12.6 % to 83.3 %, depending on the specific cases evaluated. Finally, we found out that CIPM’s execution time is reasonable and scales well with an increasing number of model elements and monitoring data.</p>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00521-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145171381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaolu Zhang, Tahmid Rafi, Yuejun Guan, Shuqing Li, Michael R. Lyu
{"title":"Understanding the privacy-realisticness dilemma of the metaverse","authors":"Xiaolu Zhang, Tahmid Rafi, Yuejun Guan, Shuqing Li, Michael R. Lyu","doi":"10.1007/s10515-025-00516-6","DOIUrl":"10.1007/s10515-025-00516-6","url":null,"abstract":"<div><p>Metaverse is a form of next-generation human–computer interaction and social networks based on virtual and augmented reality. Both the research and industry community have invested much in this area to develop useful applications and enhance user experience. Meanwhile, the expanded human–computer interface which enables the immersive experience in the Metaverse will also inevitably expand the interface of potential privacy leaks. This dilemma between immersive user experience and higher privacy risks has not been well studied and it is not clear how different users would make decisions when facing such a dilemma. In this research work, we systematically studied this dilemma in different usage scenarios of the Metaverse and performed a study on 177 users to understand the factors that may affect users’ decision making. From the study, we found that user preference on immersive experience and privacy protection can be very different in different usage scenarios and we expect our study results can provide some insights and guidance for the design of privacy protection mechanisms in Metaverse platforms and applications.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring the impact of predictive models on the software project: A cost, service time, and risk evaluation of a metric-based defect severity prediction model","authors":"Umamaheswara Sharma B, Ravichandra Sadam","doi":"10.1007/s10515-025-00519-3","DOIUrl":"10.1007/s10515-025-00519-3","url":null,"abstract":"<div><p>In a critical software system, the testers have to spend an enormous amount of time and effort maintaining the software due to the continuous occurrence of defects. To reduce the time and effort of a tester, prior works in the literature are limited to using documented defect reports to automatically predict the severity of the defective software modules. In contrast, in this work, we propose a metric-based software defect severity prediction (SDSP) model that is built using a decision-tree incorporated self-training semi-supervised learning approach to classify the severity of the defective software modules. Empirical analysis of the proposed model on the AEEEM datasets suggests using the proposed approach as it successfully assigns suitable severity class labels to the unlabelled modules. On the other hand, numerous research studies have addressed the methodological aspects of SDSP models, but the gap in estimating the performance of a developed prediction using suitable measures remains unattempt. For this, we propose the risk factor, per cent of the saved budget, loss in the saved budget, per cent of remaining edits, per cent of remaining edits, remaining service time, and gratuitous service time, to interpret the predictions in terms of project objectives. Empirical analysis of the proposed approach shows the benefit of using the proposed measures in addition to the traditional measures.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144084984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zexian Zhang, Lin Zhu, Shuang Yin, Wenhua Hu, Shan Gao, Haoxuan Chen, Fuyang Li
{"title":"The impact of feature selection and feature reduction techniques for code smell detection: A comprehensive empirical study","authors":"Zexian Zhang, Lin Zhu, Shuang Yin, Wenhua Hu, Shan Gao, Haoxuan Chen, Fuyang Li","doi":"10.1007/s10515-025-00524-6","DOIUrl":"10.1007/s10515-025-00524-6","url":null,"abstract":"<div><p>Code smell detection using machine/deep learning methods aims to classify code instances as smelly or non-smelly based on extracted features. Accurate detection relies on optimizing feature sets by focusing on relevant features while discarding those that are redundant or irrelevant. However, prior studies on feature selection and reduction techniques for code smell detection have yielded inconsistent results, possibly due to limited exploration of available techniques. To address this gap, we comprehensively analyze 33 feature selection and 6 feature reduction techniques across seven classification models and six code smell datasets. And we apply the Scott-Knott effect size difference test for comparing performance and McNemar’s test for assessing prediction diversity. The results show that (1) Not all feature selection and reduction techniques significantly improve detection performance. (2) Feature extraction techniques generally perform worse than feature selection techniques. (3) Probabilistic significance is recommended as a “generic” feature selection technique due to its higher consistency in identifying smelly instances. (4) High-frequency features selected by the top feature selection techniques vary by dataset, highlighting their specific relevance for identifying the corresponding code smells. Based on these findings, we provide implications for further code smell detection research.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}