{"title":"MITU: Locating relevant tutorial fragments of APIs with multi-source API knowledge","authors":"Di Wu , Hongyu Zhang , Yang Feng , Zhenjiang Dong","doi":"10.1016/j.jss.2024.112296","DOIUrl":"10.1016/j.jss.2024.112296","url":null,"abstract":"<div><div>API tutorials are vital resources as they can help developers learn how to use the APIs. An API tutorial is usually split into a number of consecutive units that describe the same topic, denoted as <em>tutorial fragments</em>. We treat a tutorial fragment explaining how to use an API as a <em>relevant</em> fragment of the API. Locating relevant tutorial fragments of an API can help developers understand and learn APIs. Existing approaches often train location models using API knowledge from a single resource (e.g., API tutorials). In practice, API knowledge from multiple resources such as API tutorials, Stack Overflow (SO) posts, and API specifications (denoted as <em>multi-source API knowledge</em>) is available to help locate relevant fragments of APIs. While leveraging multi-source API knowledge is intuitively more beneficial, it is a challenging task to use multi-source API knowledge due to <em>diverse distribution</em> and <em>imbalanced distribution</em> issues. Here, the diverse distribution denotes that the data in the same resource are close to each other in the feature space, while data in different resources are far away from each other. The imbalanced distribution denotes that the amount of relevant data is less than the amount of irrelevant data. In this paper, we propose a novel approach called MITU (using <u><strong>M</strong></u>ulti-source AP<u><strong>I</strong></u> knowledge to locate relevant <u><strong>TU</strong></u>torial fragments) to alleviate these two challenges. For the diverse distribution problem, MITU can project multi-source API knowledge to a correlated space where their distributions become similar. For the imbalanced distribution problem, MITU can minimize the misclassification cost when learning multi-source API knowledge. More specifically, we first collect multi-source API knowledge from API specifications, SO posts, and API tutorials, respectively. Then, we train a cost-sensitive subspace analysis based location model, which can make full use of multi-source API knowledge by addressing issues of diverse and imbalanced distributions. At last, relevant tutorial fragments of APIs can be located by consulting the trained model. We evaluate MITU on Java and Android multi-source API knowledge datasets containing a total of 44,064 samples. Experimental results show that MITU is effective and outperforms the existing approaches. Moreover, our user study confirms the effectiveness of MITU in practice.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112296"},"PeriodicalIF":3.7,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Logan Murphy, Mahmood Saifi, Alessio Di Sandro, Marsha Chechik
{"title":"A structural taxonomy for lifted software product line analyses","authors":"Logan Murphy, Mahmood Saifi, Alessio Di Sandro, Marsha Chechik","doi":"10.1016/j.jss.2024.112280","DOIUrl":"10.1016/j.jss.2024.112280","url":null,"abstract":"<div><div>A software product line (SPL) is a structured collection of distinct software products developed from a common set of artifacts. SPLs can encompass millions of products, so analysing each product in a brute-force manner is infeasible. To analyse SPLs directly, analyses must be <em>lifted</em>, i.e., redefined to accommodate the semantics of SPLs. Over the past two decades, many kinds of analyses have been lifted from products to SPLs. Looking at the landscape of lifted analyses, we observe various <em>techniques</em> for lifting which vary across numerous dimensions. To help engineers and research navigate this landscape, we propose a classification scheme for lifted analyses based on a set of features lifted analyses can exhibit. We then conduct a systematic literature review (SLR) analysing the landscape of lifted analyses produced over the last 20 years. We analyse 140 research papers which discuss the design and implementation of lifted analyses. We provide quantitative analysis of the types of analyses which have been lifted, and apply our taxonomy to clarify <em>how</em> lifting was accomplished. We discuss examples of how each of the lifting methods have been applied, and identify gaps in the research literature which may provide directions for future work.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112280"},"PeriodicalIF":3.7,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STILE: A tool for optimizing E2E web test scripts parallelization","authors":"Dario Olianas , Maurizio Leotta , Filippo Ricca , Matteo Biagiola , Paolo Tonella","doi":"10.1016/j.jss.2024.112304","DOIUrl":"10.1016/j.jss.2024.112304","url":null,"abstract":"<div><div>Web applications quality is commonly assessed by executing End-to-End (E2E) test scripts interacting with those systems as a human tester would. To avoid setting up the web application state for each test script, testers usually create test scripts that may depend on others previously executed. However, the presence of dependencies prevents parallelization, a fundamental technique for speedup the execution of large test suites.</div><div>In this paper, we present <span>Stile</span>, a tool for parallelizing the execution of E2E web test scripts that generates and executes a set of test schedules satisfying two important constraints: (1) every schedule respects existing test dependencies, and (2) all test scripts in the test suite are executed at least once. Moreover, <span>Stile</span> optimizes the execution by running only once the test scripts that are shared among the schedules.</div><div>We empirically evaluated <span>Stile</span> on eight E2E test suites by comparing the execution time of <span>Stile</span> both with the sequential execution and with the parallel execution based on Selenium Grid. Our results show that <span>Stile</span> can reduce the execution time up to 80% w.r.t. the sequential execution and up to 50% w.r.t. Grid. Moreover, <span>Stile</span> provides a reduction in the CPUs usage (i.e., overall CPU-time) up to 75%.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112304"},"PeriodicalIF":3.7,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CSAT: Configuration structure-aware tuning for highly configurable software systems","authors":"Yufei Li , Liang Bao , Kaipeng Huang , Chase Wu","doi":"10.1016/j.jss.2024.112316","DOIUrl":"10.1016/j.jss.2024.112316","url":null,"abstract":"<div><div>Many modern software systems provide numerous configuration options with a large parameter space that users can adjust for specific running environments. However, configuring such systems always incurs an undue burden on users due to the lack of domain knowledge to understand complex interactions between the performance and the parameters. To address this issue, various tuning techniques have been developed to automatically determine the optimal configuration by either directly searching the configuration space or learning a surrogate model to guide the exploration process. Most previous studies only apply simple search strategies to explore the complex configuration space, which often leads to fruitless attempts in suboptimal areas. Inspired by previous studies, we define configuration structures to describe the positions of various configurations in the performance space of software systems. This idea leads to the design of a novel Configuration Structure-Aware Tuning (CSAT) algorithm. CSAT constructs a structure model for system configurations using the framework of Adaptive Network-based Fuzzy Inference System (ANFIS), learns a comparison-based distribution model through Gaussian Process Regression (GPR), and uses Bayesian Inference to generate potentially promising configurations based on the structure. The experimental results demonstrate that in terms of tuning performance, on average, CSAT outperforms default configurations by 65.51% and outperforms six state-of-the-art tuning algorithms by 22.10%–33.20%. In terms of handling internal constraints, CSAT achieves an average probability of 0.767 in generating valid configurations.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112316"},"PeriodicalIF":3.7,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Atlas, a modular and efficient open-source BFT framework","authors":"Nuno Neto , Rolando Martins , Luís Veiga","doi":"10.1016/j.jss.2024.112317","DOIUrl":"10.1016/j.jss.2024.112317","url":null,"abstract":"<div><div>Over the last few decades, a large body of research was carried out covering Byzantine Fault Tolerance (BFT) systems. This research has brought forward new techniques, including but not limited, for ordering operations (Abraham et al., 2018; Buchman, 2016; Guo et al., 2020; Bessani et al., 2014; Duan et al., 2018) and state transfer (Bessani et al., 2013; <span><span>Distler, 2021</span></span>, <span><span>Eischer et al., 2019</span></span>), on networks that suffer from byzantine faults. More recently, the ongoing research on distributed ledgers re-ignited the interest on BFT, due to its high throughput when compared to other alternatives of byzantine consensus (<span><span>Vukolić, 2016</span></span>).</div><div>In this paper we present three contributions covering several aspects, including modular and extensible framework design and implementation, system optimization through development of better networking alternatives, a greater use of parallelism, several ordering protocol improvements and extensive comparative assessment of previous state-of-the-art approaches.</div><div>First, we introduce Atlas, an open-source modular BFT framework that aims to support the research and development of highly efficient BFT protocols, by decoupling traditionally entangled sub-protocols, e.g., consensus primitive from the execution (Bessani et al., 2014), and deferment of log management to replicated services from state transfer. Atlas allows to further provide modules that can be re-used across different BFT approaches, such as deterministic and probabilistic/randomized models.</div><div>Second, we present FeBFT, a new BFT implementation developed upon Atlas that combines pre-existing proven ideas from PBFTs, namely its 3-phase consensus and view-change protocol. This base approach is then extended with novel optimizations of the protocol, namely, multi-leader proposals (Stathakopoulou et al., 2019), multi-instance consensus execution (Stathakopoulou et al., 2022; Behl et al., 2015), and configurable batching solution that allow us to reduce the latency while improving throughput at the same time.</div><div>Third, we offer a comprehensive evaluation amongst our work and other state-of-the-art BFT-SMR implementations, namely, Atlas (<span><span>Neto et al., 2024a</span></span>) with FeBFT (Official febft repository 2024), BFT-SMaRt (Bessani et al., 2014) and Themis (Rüsch et al., 2019).</div><div>With these contributions, we aim to lay the ground work to: (i) improve reusability and hence productivity in BFT(-SMR) development; (ii) increase system safety, performance, scalability and reduce recovery time with the optimizations proposed; (iii) draw insights on the bottlenecks preventing order-of-magnitude improvements in BFT processing from a system’s perspective; and lastly, (iv) improve reproducibility between different BFT (sub-)protocols by allowing for true apples-to-apples comparisons.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112317"},"PeriodicalIF":3.7,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143103984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sabato Nocera, Simone Romano, Rita Francese, Giuseppe Scanniello
{"title":"Software engineering education: Results from a training intervention based on SonarCloud when developing web apps","authors":"Sabato Nocera, Simone Romano, Rita Francese, Giuseppe Scanniello","doi":"10.1016/j.jss.2024.112308","DOIUrl":"10.1016/j.jss.2024.112308","url":null,"abstract":"<div><div>Past research suggests that Computer Science (CS) undergraduate students are not equipped to manage quality characteristics such as security, reliability, and maintainability. Filling such a gap should allow CS undergraduates an easier integration into the labor market after graduation. To make students more ready for such a market, we introduced a training intervention in our <em>Software Technologies for the Web</em> (<em>STW</em> ) course in the academic year (a.y.) 2022–23. Our intervention focused on security, <em>i.e.,</em> students were trained on secure development and were asked to use <em>SonarCloud</em>. To assess this intervention, we compared the web apps developed in a.y. 2021–22 and a.y. 2022–23 and observed that the security significantly improved in the a.y. 2022–23 web apps. To understand whether and to what extent our training intervention triggered autonomous motivation in the students (a.y. 2022–23) on reliability and maintainability, we also compared the web apps of a.y. 2021–22 and a.y. 2022–23 on these issues. To that end, we did not ask students to deal with reliability and maintainability. This part of our research is presented in this paper for the first time and revealed that the web apps of a.y. 2022–23 are more reliable and maintainable than those of a.y. 2021–22.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112308"},"PeriodicalIF":3.7,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dear researchers step 1: Find a team with a problem","authors":"Eoin Woods","doi":"10.1016/j.jss.2024.112318","DOIUrl":"10.1016/j.jss.2024.112318","url":null,"abstract":"","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112318"},"PeriodicalIF":3.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Coppola , Robert Feldt , Michel Nass , Emil Alégroth
{"title":"Ranking approaches for similarity-based web element location","authors":"Riccardo Coppola , Robert Feldt , Michel Nass , Emil Alégroth","doi":"10.1016/j.jss.2024.112286","DOIUrl":"10.1016/j.jss.2024.112286","url":null,"abstract":"<div><h3>Context:</h3><div>GUI-based tests for web applications are frequently broken by fragility, i.e. regression tests fail due to changing properties of the web elements. The most influential factor for fragility are the locators used in the scripts, i.e. the means of identifying the elements of the GUI.</div></div><div><h3>Objective:</h3><div>We extend a state-of-the-art Multi-Locator solution that considers 14 locators from the DOM model of a web application, and identifies overlapping nodes in the DOM tree (VON-Similo). We augment the approach with standard Machine Learning and Learning to Rank (LTR) approaches to aid the location of web elements.</div></div><div><h3>Method:</h3><div>We document an experiment with a ground truth of 1163 web element pairs, taken from different releases of 40 web applications, to compare the robustness of the algorithms to locator weight change, and the performance of LTR approaches in terms of MeanRank and PctAtN.</div></div><div><h3>Results:</h3><div>Using LTR algorithms, we obtain a maximum probability of finding the correct target at the first position of 88.4% (lowest 82.57%), and among the first three positions of 94.79% (lowest 91.86%). The best mean rank of the correct candidate is 1.57.</div></div><div><h3>Conclusion:</h3><div>The similarity-based approach proved to be highly dependable in the context of web application testing, where a low percentage of matching errors can still be accepted.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112286"},"PeriodicalIF":3.7,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143103985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingwei Ye , Chunbo Liu , Zhaojun Gu , Zhikai Zhang , Xuying Meng , Weiyao Zhang , Yujun Zhang
{"title":"LogOW: A semi-supervised log anomaly detection model in open-world setting","authors":"Jingwei Ye , Chunbo Liu , Zhaojun Gu , Zhikai Zhang , Xuying Meng , Weiyao Zhang , Yujun Zhang","doi":"10.1016/j.jss.2024.112305","DOIUrl":"10.1016/j.jss.2024.112305","url":null,"abstract":"<div><div>Log anomaly detection is a method for finding abnormal behavior and faults in systems. However, existing methods face two main challenges: the open-world problem and the cold-start problem. The open-world problem means that the test set may contain new classes that are not in the training set, while the cold-start problem means that the initial training data are scarce, both for normal and abnormal log sequences. Most existing methods assume a closed-world setting and rely on sufficient normal data, which limits their adaptability to new log environments.</div><div>We propose LogOW, a novel log anomaly detection model that can learn from a few normal log sequences. The model finds emerging normal log sequences in the open-world setting through the <strong>open-world sample retrieval</strong> module. Through the <strong>incremental pre-training</strong> module, these log sequences are fine-tuned in an online mode for model parameters.</div><div>First, we train a basic model from normal log sequences using Masked-Language Modeling(MLM). During the testing phase, we then combine the anomaly score and the uncertainty score obtained through a novel dynamic multi-mask to distinguish closed-world normal log sequences from the test set. Next, we cluster the open-world log sequences based on fused sequence and count features, and identify the abnormal ones and the new normal ones. Finally, we update our model with the new normal sequences in the next time period. Experiments on three log datasets and real-world airport logs show that our model outperforms traditional models in the open-world and lack of training data setting.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112305"},"PeriodicalIF":3.7,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143104091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data race detection via few-shot parameter-efficient fine-tuning","authors":"Yuanyuan Shen , Manman Peng , Fan Zhang , Qiang Wu","doi":"10.1016/j.jss.2024.112289","DOIUrl":"10.1016/j.jss.2024.112289","url":null,"abstract":"<div><div>The OpenMP programming model is playing an increasing role in parallelization on shared-memory systems owing to its simplicity of operation and portability. OpenMP provides the semantic equivalent of a parallel program for the original sequential program. Though it is easier to write parallel programs using OpenMP, writing them correctly is a challenge. Data race conditions errors can easily occur during the writing process, particularly by inexperienced programmers. Some data race checkers have been developed to help programmers check for data race in parallel programs. However, several of them have constraints on the input and thread configuration, time overhead, and scope of program analysis. In this study, we target data race detection in OpenMP parallel programs to address the issues of constraints from checkers. We propose a few-shot parameter-efficient fine-tuning method using adapter module to address data race detection issue. The proposed method does not require a large labeled dataset, and it makes data efficient. A generic dataset is constructed with a limited number of labeled data, containing diverse OpenMP patterns for data race detection. A neural architecture search approach is employed to improve the performance of detection. The experimental results on the generated and open-source datasets demonstrate that our method is effective and improves race detection compared with traditional methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112289"},"PeriodicalIF":3.7,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143103989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}