{"title":"A Comprehensive Framework for Testing Goal-Oriented NFPs in Software Product Lines","authors":"Ibtesam Bashir Gwasem","doi":"10.1002/smr.2760","DOIUrl":"https://doi.org/10.1002/smr.2760","url":null,"abstract":"<p>In the realm of software product line engineering (SPLE), ensuring the quality of end products is paramount for market success. SPLE promotes systematic software development through reuse by focusing on commonalities and variabilities within a domain to efficiently produce a family of related systems. The quality of a software system depends on its functional properties (FPs)—the functionalities it provides—and its non-functional properties (NFPs)—the quality attributes it possesses, such as security and performance. NFPs are particularly critical because they directly impact user satisfaction, determine project success, and significantly influence market acceptance. However, in SPLE, despite their recognized importance, NFPs often receive less attention compared to FPs, leading to potential quality risks and increased costs. This paper presents a framework for testing goal-oriented NFPs in software product lines, addressing this gap. By integrating goal models, the framework supports the systematic capture and validation of NFPs from early development stages. The framework's applicability is illustrated through research-based case studies in an online bookstore product line, demonstrating its use for systematic NFPs testing at both the domain and application levels. A comparative analysis with an existing technique highlights the framework's unique contributions in addressing NFPs testing within software product lines. Additionally, a preliminary experiment using two widely recognized product line domain examples evaluated the core testing process supported by the framework during the domain engineering phase, focusing on effectiveness, performance efficiency, and time consistency in structured research settings.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 10","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.2760","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145316899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PredictPP: A Rank-Based Weighted Ensemble Model for Prediction of Software Project Productivity","authors":"Suyash Shukla, Sandeep Kumar","doi":"10.1002/smr.70059","DOIUrl":"https://doi.org/10.1002/smr.70059","url":null,"abstract":"<div>\u0000 \u0000 <p>Software effort estimation (SEE) determines the effort necessary to develop software. The researchers have been tending to SEE issues since the 1960s, and several methods have been created until the formulation of the function point (FP) and constructive cost estimation (COCOMO) methods. However, these methods are only useful for procedurally developed software, not modern object-oriented (OO) software. Because the use case is the widely used unit of an OO system, particularly in scenarios requiring structured and early-stage effort estimation, using the use case point (UCP) approach will help get accurate results. The UCP approach consists of size estimation (in UCP) and effort estimation with calculated size. This study focuses on effort estimation when the size (in UCP) is already known. The productivity of a project is one of the main components for estimating effort from the given size. The classical SEE models based on UCP utilized a fixed number of productivity values. So, the validity of classical approaches is a subject of disapproval because of static productivity values. Purposefully, we proposed a rank-based weighted ensemble model for productivity prediction that allows us to use flexible productivity values. We used learning techniques such as simple linear regression (SLR), Least Absolute Shrinkage and Selection Operator Regression (LR), ridge regression (RR), elastic net regression (ER), K-nearest neighbor (KNN), decision tree (DT), support vector regression (SVR), multilayer perceptron (MLP), bagging, and adaptive boosting for productivity prediction and compared them with the proposed model. Further, we used existing UCP prediction models and compared the proposed approach with them.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 10","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145272089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lili Quan, Jin Wen, Qiang Hu, Maxime Cordy, Yuheng Huang, Lei Ma, Xiaohong Li
{"title":"Evaluation and Improvement of Test Selection for Large Language Models","authors":"Lili Quan, Jin Wen, Qiang Hu, Maxime Cordy, Yuheng Huang, Lei Ma, Xiaohong Li","doi":"10.1002/smr.70057","DOIUrl":"https://doi.org/10.1002/smr.70057","url":null,"abstract":"<div>\u0000 \u0000 <p>Large language models (LLMs) have recently achieved significant success across various application domains, garnering substantial attention from different communities. Unfortunately, many <i>faults</i> still exist that LLMs cannot properly predict. Such faults will harm the usability of LLMs in general and could introduce safety issues in reliability-critical systems such as autonomous driving systems. How to quickly reveal these faults in real-world datasets that LLMs could face is important but challenging. The major reason is that the ground truth is necessary but the data labeling process is heavy considering the time and human effort. To handle this problem, in the conventional deep learning testing field, test selection methods have been proposed for efficiently evaluating deep learning models by prioritizing faults. However, despite their importance, the usefulness of these methods on LLMs is unclear and underexplored. In this paper, we conduct the first empirical study to investigate the effectiveness of existing test selection methods for LLMs. We focus on classification tasks because most existing test selection methods target this setting and reliably estimating confidence scores for variable-length outputs in generative tasks remains challenging. Experimental results on four different tasks (including both code tasks and natural language processing tasks) and four LLMs (e.g., LLaMA3 and GPT-4) demonstrated that simple methods such as Margin perform well on LLMs, but there is still a big room for improvement. Based on the study, we further propose MuCS, a prompt Mutation-based prediction Confidence Smoothing framework to boost the test selection capability for LLMs specifically on classification tasks. Concretely, multiple prompt mutation techniques have been proposed to help collect diverse outputs for confidence smoothing. The results show that our proposed framework significantly enhances existing methods with test relative coverage improvement by up to 70.53%.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 10","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145272090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emanuele De Angelis, Guglielmo De Angelis, Maurizio Mongelli, Maurizio Proietti
{"title":"A Validation Methodology for XAI Decision Support Systems Against Relational Domain Properties","authors":"Emanuele De Angelis, Guglielmo De Angelis, Maurizio Mongelli, Maurizio Proietti","doi":"10.1002/smr.70054","DOIUrl":"https://doi.org/10.1002/smr.70054","url":null,"abstract":"<p>The global adoption of artificial intelligence (AI) has increased dramatically in recent years, becoming commonplace in many fields. Such a pervasiveness has led to changes in how AI is perceived, strengthening discussions on its societal consequences. Thus, a new class of requirements for AI-based solutions emerged. Broadly speaking, those on “explainability” aim to provide a transparent representation of the (often opaque) reasoning method that an AI-based solution uses when prompted. This work presents a methodology for validating a class of explainable AI (XAI) models, called deterministic rule-based models, which are used for expressing an explainable approximation of classifiers based on machine learning. The validation methodology combines logical deduction with constraint-based reasoning in numerical domains, and it either succeeds or returns quantitative estimations of the invalid deviations found. This information allows us to assess the correctness of an XAI model, or in the case of deviations, to evaluate if it still can be deemed acceptable. The validation methodology has been applied to a simulation-based study where the decision-making process copes with the spread of SARS-COV-2 inside a railway station. The considered case study is a controlled but nontrivial example that shows the overall applicability of the methodology.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 10","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/smr.70054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145227973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shabbab Ali Algamdi, Abdul Wahid Khan, Jamshid Ahmad, Moulay Ibrahim El-Khalil Ghembaza
{"title":"Key Success Factors of Cybersecurity Awareness in Distributed Teams","authors":"Shabbab Ali Algamdi, Abdul Wahid Khan, Jamshid Ahmad, Moulay Ibrahim El-Khalil Ghembaza","doi":"10.1002/smr.70056","DOIUrl":"https://doi.org/10.1002/smr.70056","url":null,"abstract":"<div>\u0000 \u0000 <p>Strong cybersecurity procedures are now more important than ever due to the increased reliance on remote workers. Given the dynamic nature of cyber threats and the necessity of preventative actions, this paper highlights the vital significance of thorough cybersecurity awareness training for remote workers. A customized cybersecurity awareness training model can improve distributed team preparedness and decrease cyberattacks. Organizations should institute regular security awareness programs to educate distributed teams on emerging cyber threats. Vendor businesses should prioritize security education to prevent cyberattacks and protect sensitive data. Our proposed model aims to improve distributed team members' preparedness against cyber threats, enabling organizations to safeguard remote work settings effectively. Our systematic literature review identified key cybersecurity factors, synthesized into 12 groups, including “Unified Governance Framework” and “Secure Mind Initiative.”</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145146327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Merge Conflict Prediction Using Feature Selection and Stacking Heterogeneous Ensembles: An Empirical Investigation","authors":"Reem Alfayez, Amal Alazba","doi":"10.1002/smr.70047","DOIUrl":"https://doi.org/10.1002/smr.70047","url":null,"abstract":"<div>\u0000 \u0000 <p>Merge conflicts arise when multiple developers simultaneously modify the same part of a codebase and attempt to merge their changes. These conflicts occur because the version control system (VCS) cannot automatically determine which changes should take precedence. Resolving such conflicts involves manually reviewing the conflicting changes and deciding how to integrate them to maintain a functional and coherent codebase. This process is often time-consuming, complex, and prone to errors. Consequently, the software engineering community has focused on predicting merge conflicts to warn developers early and allow them to address conflicts before they escalate. Despite several efforts to predict merge conflicts, no perfect solution has been identified. Fortunately, many machine learning techniques have demonstrated potential in improving prediction performance across various contexts. This study aims to empirically investigate the effectiveness of stacking heterogeneous ensembles in enhancing merge conflict prediction performance. We empirically compared the prediction performance of the following individual models: decision trees (DT); support vector machine (SVM) with a linear kernel; naive Bayes (NB) with Bernoulli, Gaussian, and Multinomial variants; logistic regression (LR); multilayer perceptron (MLP); stochastic gradient descent (SGD); and k-nearest neighbors (KNN). Additionally, we evaluated three heterogeneous stacking ensembles: Stack-DT, Stack-SVM, and Stack-LR, which were constructed using the aforementioned individual models as base models. We utilized gain ratio (GR) to identify the most important technical and social features for predicting merge conflicts and assessed the impact of using only these important features on the performance of both individual and stacking models. The study revealed variability in the performance of individual models, with DT demonstrating the best predictive performance among them. Heterogeneous stacking ensembles demonstrated potential to enhance merge conflict prediction, with Stack-SVM emerging as the top-performing model. GR analysis highlighted the importance of both social and technical features in predicting merge conflicts. However, using only the most important features identified by GR led to a decline in the performance of most models compared to using all features. Heterogeneous stacking ensembles significantly improve prediction performance over individual models. Both social and technical features are important in predicting merge conflicts, and utilizing the full set of features instead of only the most important ones generally yields better results.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145172015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shah Nawaz, Muhammad Yaseen, Gohar Rahman, Jasim Saeed
{"title":"Cluster Analysis of Security Threats in Web Applications: A Multiphase SDLC Analysis","authors":"Shah Nawaz, Muhammad Yaseen, Gohar Rahman, Jasim Saeed","doi":"10.1002/smr.70055","DOIUrl":"https://doi.org/10.1002/smr.70055","url":null,"abstract":"<div>\u0000 \u0000 <p>Security threats in web applications have increasingly become a major concern, particularly as modern web systems grow more complex and interconnected. Addressing these security challenges requires a comprehensive understanding of how threats are distributed across different phases of the software development life cycle (SDLC) and how various threat categories map to specific SDLC stages. Despite significant research into software security, a systematic and structured review focusing on the hierarchical relationships between SDLC phases, security threat categories, and specific threats remains scarce. This paper aims to fill this gap by conducting a clustering-based systematic review of security threats in web applications. Using data from existing literature on software security threats, we applied hierarchical clustering, K-means analysis, and co-occurrence mapping to identify relationships between SDLC phases (Level 1), security threat categories (Level 2), and specific security threats (Level 3). The findings show that the development phase presents the highest risk, more so to threats like weaknesses in architectural security design and input validation issues. Using clustering techniques, we showed how some of the threats appeared in more than one SDLC stage and classified them within the categories of threats most closely associated with the SDLC stage. Taking into account these factors, we propose recommendations for software development process stakeholders allowing for the implementation of more consistent strategies of threat mitigation through the entire SDLC. Considering these observations, it can be concluded that there is an acute deficiency in development for globalization of software security measures towards web applications to control future security threats.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145111066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Data to Knowledge: Mining Linux Vulnerability Characteristics and Evolution With Knowledge Graphs","authors":"Shiyu Weng, Xiaoxue Wu, Tianci Li, Chen Yao, Wenjing Shan, Xiaobing Sun","doi":"10.1002/smr.70053","DOIUrl":"https://doi.org/10.1002/smr.70053","url":null,"abstract":"<div>\u0000 \u0000 <p>An operating system is the essence of software, serving as the foundation for the operation of various application software. The security of the operating system is crucial for national informatization construction. Data indicate that many cybersecurity incidents result from exploiting security vulnerabilities in the operating system. Linux is currently the most widely used open-source operating system, with thousands of Common Vulnerabilities and Exposures (CVEs) related to Linux systems reported each year. Therefore, research and prevention of vulnerabilities in the Linux system are particularly important. To gain a better understanding of the characteristics of Linux system vulnerabilities, this paper leverages knowledge in the field of software security to analyze nearly 10,000 historical vulnerability data in two core systems of Linux: Linux Kernel and Debian Linux. The study explores the evolutionary patterns of vulnerability characteristics. Specific research contents include the following: (1) data collection and cleaning of vulnerability data in Linux Kernel and Debian Linux systems; (2) cross-statistical analysis of structured data features in vulnerability reports; (3) unstructured data characteristics mining in vulnerability reports based on domain knowledge; (4) analysis of the evolution of vulnerability characteristics. This paper provides empirical lessons and guidance for Linux system vulnerabilities to assist practitioners and researchers in better preventing and detecting vulnerabilities in Linux and Linux-based systems.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145101229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linyi Han, Hang Li, Xiaowang Zhang, Youmeng Li, Zhiyong Feng
{"title":"UCLP: Unsupervised Classification of Key Aspects in Vulnerability Descriptions Through Label Profile","authors":"Linyi Han, Hang Li, Xiaowang Zhang, Youmeng Li, Zhiyong Feng","doi":"10.1002/smr.70052","DOIUrl":"https://doi.org/10.1002/smr.70052","url":null,"abstract":"<div>\u0000 \u0000 <p>Textual vulnerability descriptions (TVDs) in repositories like NVD and IBM X-Force Exchange are essential for security engineers managing vulnerabilities. Engineers typically search for key aspects in TVDs using specific phrases, but with multiple expressions for each aspect, retrieving all relevant records is challenging. We propose a label-based retrieval framework that classifies key aspects and retrieves TVDs by their broader categories. Given the large data volume, manual labeling is infeasible, making unsupervised classification critical. However, short labels and repeated words diminish semantic clarity, affecting classification accuracy. We introduce Unsupervised Classification through Label Profile (UCLP), which expands label semantics through label profiles inspired by recommendation systems. We construct profiles using neural network weights and apply TF-IDF to calculate similarities, smoothing distributions with an arctangent function. Results show that UCLP significantly outperforms four benchmarks, raising accuracy from 68.3% to 78.9% and improving three real-world applications.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145038309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UFR-OSFA: Unified Feature Representation and Oppositional Structure Feature Alignment for Mixed-Project Heterogeneous Defect Prediction","authors":"Yifan Zou, Huiqiang Wang, Hongwu Lv, Shuai Zhao","doi":"10.1002/smr.70049","DOIUrl":"https://doi.org/10.1002/smr.70049","url":null,"abstract":"<div>\u0000 \u0000 <p>Heterogeneous defect prediction (HDP) plays a crucial role in software engineering by enabling the early detection of software defects across projects with heterogeneous feature spaces. Recently, some mixed-project HDP (MP-HDP) methods have been proposed, which have demonstrated modest improvements in HDP performance. Nevertheless, existing MP-HDP approaches fail to address feature redundancy and distribution inconsistency simultaneously. To overcome these limitations, this paper proposes a novel MP-HDP approach, UFR-OSFA, based on unified feature representation and oppositional structural feature alignment. Concretely, UFR-OSFA first unifies these features by reducing the distribution differences between source and target projects through matching common features and the Hungarian algorithm based on the Kolmogorov–Smirnov (KS) test. Subsequently, utilizing a generator and two classifiers with oppositional structures, UFR-OSFA separates the features of the source project and clusters those of the target project, addressing the issue of conditional distribution mismatch and enhancing the model's generalization ability in the target project. Extensive experiments on 23 projects from five datasets demonstrate that the proposed approach performs better or comparably to baseline methods.</p>\u0000 </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145037623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}