{"title":"Just-in-time software defect prediction via bi-modal change representation learning","authors":"Yuze Jiang , Beijun Shen , Xiaodong Gu","doi":"10.1016/j.jss.2024.112253","DOIUrl":"10.1016/j.jss.2024.112253","url":null,"abstract":"<div><div>For predicting software defects at an early stage, researchers have proposed just-in-time defect prediction (JIT-DP) to identify potential defects in code commits. The prevailing approaches train models to represent code changes in history commits and utilize the learned representations to predict the presence of defects in the latest commit. However, existing models merely learn editions in source code, without considering the natural language intentions behind the changes. This limitation hinders their ability to capture deeper semantics. To address this, we introduce a novel bi-modal change pre-training model called BiCC-BERT. BiCC-BERT is pre-trained on a code change corpus to learn bi-modal semantic representations. To incorporate commit messages from the corpus, we design a novel pre-training objective called Replaced Message Identification (RMI), which learns the semantic association between commit messages and code changes. Subsequently, we integrate BiCC-BERT into JIT-DP and propose a new defect prediction approach — JIT-BiCC. By leveraging the bi-modal representations from BiCC-BERT, JIT-BiCC captures more profound change semantics. We train JIT-BiCC using 27,391 code changes and compare its performance with 8 state-of-the-art JIT-DP approaches. The results demonstrate that JIT-BiCC outperforms all baselines, achieving a 10.8% improvement in F1-score. This highlights its effectiveness in learning the bi-modal semantics for JIT-DP.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112253"},"PeriodicalIF":3.7,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amit Kumar Mondal , Mainul Hossain , Chanchal K. Roy , Banani Roy , Kevin A. Schneider
{"title":"FSECAM: A contextual thematic approach for linking feature to multi-level software architectural components","authors":"Amit Kumar Mondal , Mainul Hossain , Chanchal K. Roy , Banani Roy , Kevin A. Schneider","doi":"10.1016/j.jss.2024.112245","DOIUrl":"10.1016/j.jss.2024.112245","url":null,"abstract":"<div><div>Linking software features to code components is commonly performed during software development and maintenance, including to implement a feature, document code, design test cases, trace requirements, track changes, and support inspection of safety–critical software by government and other third parties. However, manually mapping features to code is error-prone and time consuming, even for developers familiar with a system. To overcome these challenges several studies proposed automated techniques to reduce human intervention when linking features to code components. Nonetheless, three challenges remain: (i) accuracy, (ii) cost, and (iii) explainability. Linking of irrelevant code snippets causes an extra burden of analyses. If the approach lacks explainability, then a tool is less useful for many crucial systems such as safety–critical software. Moreover, heavyweight techniques such as those that require generating execution traces of every scenario or require training deep-learning models are costly and limit small companies from integrating them into their development process.</div><div>We propose a contextual thematic approach that extracts the most relevant theme properties of the feature/requirement to address the aforementioned challenges. Our experiments with two proprietary projects reveal significant enhancement of performance (precision and F1 scores are more than 50% in ideal cases) in linking features to three abstractions of code components, i.e., modules, classes, and methods. Our approach is also capable of linking commits to issues in a promising way. Contextual theme extraction enhances the subjective explainability which has not yet been solved with existing approaches. Moreover, we extract several critical characteristics of the feature documents and code structures that are important to consider in both manual and automated techniques. Finally, we present the FSECAM tool for linking features to code components, which can be immediately deployed within the development process and used without much effort and cost in linking code components and commits.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112245"},"PeriodicalIF":3.7,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiyu Zhang , Xingya Wang , Lichao Feng , Song Huang , Zhenyu Chen , Zhihong Zhao
{"title":"DeepKernel: 2D-kernels clustering based mutant reduction for cost-effective deep learning model testing","authors":"Shiyu Zhang , Xingya Wang , Lichao Feng , Song Huang , Zhenyu Chen , Zhihong Zhao","doi":"10.1016/j.jss.2024.112247","DOIUrl":"10.1016/j.jss.2024.112247","url":null,"abstract":"<div><div>Mutation testing is a practical approach for evaluating the quality of deep learning (DL) testing datasets. However, the enormous mutants during testing lead to significant testing overhead. Feature clustering is a conventional method that reduces the number of mutants while preserving the mutants’ distribution diversity. This distribution diversity is considered crucial for maintaining the effectiveness of testing assessment ability. DL model relies on convolutional kernels to extract data features and construct logic. Thus, using kernels to measure the differences among DL mutants is a feasible approach. This paper proposes DeepKernel, a convolutional kernel features clustering based reduction method. Specifically, it considers 2D-Kernel sparsity and 2D-Kernel entropy as kernel features. The features are clustered to construct a subset with equivalent testing assessment capability to the original set. Empirical studies on four classical DL models demonstrate that: (1) there is a significant correlation between the distribution diversity of the mutants and their testing assessment ability, as indicated by a Spearman Correlation Coefficient of 0.9689. (2) the reduced set maintains a similar distribution diversity and testing effectiveness as the original set. (3) when preserving the effectiveness of the mutation testing, our method reduces 63.47% of mutants and outperforms random selection.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112247"},"PeriodicalIF":3.7,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formalization of Quantum Intermediate Representations for code safety","authors":"Junjie Luo, Jianjun Zhao","doi":"10.1016/j.jss.2024.112236","DOIUrl":"10.1016/j.jss.2024.112236","url":null,"abstract":"<div><div>Quantum Intermediate Representation (QIR) is an LLVM-based intermediary representation developed by Microsoft for quantum program compilers. QIR’s objective is to serve as a versatile solution for quantum program compilers, irrespective of the programming languages used at the front end and the hardware utilized at the back end. This approach minimizes redundant development efforts involving intermediary representations and compilers. Currently, QIR remains in the development phase and is described informally in natural language, lacking a formal definition. This informal description leads to interpretational ambiguity and a shortage of precision when implementing quantum functions. Our work aims to address this gap by providing formal definitions for QIR’s data types and instruction sets. We strive to establish correctness and safety assurances for operations and intermediate code conversions within the QIR framework. To substantiate our design, we present potentially unsafe QIR code instances that our formal approach can detect and rectify. This contribution enhances the reliability and robustness of quantum program development within the QIR context.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112236"},"PeriodicalIF":3.7,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fairness for machine learning software in education: A systematic mapping study","authors":"Nga Pham , Hung Pham Ngoc , Anh Nguyen-Duc","doi":"10.1016/j.jss.2024.112244","DOIUrl":"10.1016/j.jss.2024.112244","url":null,"abstract":"<div><div>The integration of machine learning (ML) systems into various sectors, notably education, has great potential to transform business workflows and decision-making processes. However, this technological advancement brings forth critical ethical concerns, particularly concerning the fairness of decisions affecting diverse groups of people. Our objective was to systematically map out the landscape of ML fairness research in higher education by exploring seven key research questions. These questions span a range of topics from the types of ML algorithms used in education to the methods of fairness assessment and the results achieved in terms of equity. We included 63 primary studies published between 2002 and 2023. The most common setting for AI Fairness research are: traditional machine learning algorithms (Logistic Regression, Random Forest, Decision Tree), sensitive variables (gender, race, ethnicity), and various definitions of fairness (Group fairness, Demographic parity, Equalized odds). We also identify several future research directions, including fairness assurance for multiple sensitive variables, combining different fairness concepts and metrics, open-source benchmarking tools, and fairness testing for modern ML/AI models.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112244"},"PeriodicalIF":3.7,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring code efficiency optimization capabilities with ACEOB","authors":"Yue Pan, Xiuting Shao, Chen Lyu","doi":"10.1016/j.jss.2024.112250","DOIUrl":"10.1016/j.jss.2024.112250","url":null,"abstract":"<div><div>As Moore’s Law gains diminish, software performance and efficiency become increasingly vital. Optimizing code efficiency is challenging, even for professional programmers. However, related research remains relatively scarce, and rigorously assessing models’ abilities to optimize code efficiency is fraught with difficulties. In response to this challenge, we first conduct an in-depth analysis of “code patterns” in the model training dataset, meticulously exploring human-written code. Secondly, we define a task for optimizing code efficiency and introduce the <strong>A</strong>utomatic <strong>C</strong>ode <strong>E</strong>fficiency <strong>O</strong>ptimization <strong>B</strong>enchmark (ACEOB), which consists of 95,359 pairs of efficient–inefficient code aimed at assessing code efficiency optimization capabilities. To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization. To evaluate models’ ability in optimizing code efficiency, we propose two new metrics: the <strong>I</strong>somorphic <strong>O</strong>ptimal <strong>C</strong>omparison <strong>C</strong>ode<strong>B</strong>LEU (IOCCB) metric and the <strong>N</strong>ormalized <strong>P</strong>erformance <strong>I</strong>ndex (NPI) metric, to assess the efficiency of model-generated code. We also evaluate several advanced code models, such as PolyCoder and CodeT5, after fine-tuning them on ACEOB and demonstrate that the efficiency of each model improves after introducing the NPI filter. However, it was observed that even ChatGPT does not perform optimally in code efficiency optimization tasks. Our dataset and models are available at: <span><span>https://github.com/CodeGeneration2/ACEOB</span><svg><path></path></svg></span>.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112250"},"PeriodicalIF":3.7,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Rodrigues-Filho , Iwens Sene Jr. , Barry Porter , Luiz F. Bittencourt , Fabio Kon , Fábio M. Costa
{"title":"Exploring emergent microservice evolution in elastic deployment environments","authors":"Roberto Rodrigues-Filho , Iwens Sene Jr. , Barry Porter , Luiz F. Bittencourt , Fabio Kon , Fábio M. Costa","doi":"10.1016/j.jss.2024.112252","DOIUrl":"10.1016/j.jss.2024.112252","url":null,"abstract":"<div><div>Microservices have become an important technology to enable the dynamic composition of large-scale self-adaptive systems. Although modern microservice ecosystems provide a variety of autonomous adaptation mechanisms, when focusing on the microservice itself, they can only account for changes in the sheer increase in workload volume. On the other hand, when workload patterns change, efficient treatment requires the intervention of DevOps experts to manually evolve the internal architecture of services. Given the need to quickly adapt systems to respond to changes, solely relying on DevOps to react to workload pattern changes becomes a bottleneck for future systems. To address this issue, we advance the concept of emergent microservices, that autonomously adapt and evolve their internal architectural composition to better handle changes in the pattern of incoming requests without human intervention. We demonstrate the effectiveness of our approach by exploring this novel concept in the context of a microservice-based Smart City platform.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112252"},"PeriodicalIF":3.7,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142444757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luka Lelovic , Austin Huzinga , Gabriel Goulis , Anshpreet Kaur , Ricardo Boone , Umidjon Muzrapov , Amr S. Abdelfattah , Tomas Cerny
{"title":"Change impact analysis in microservice systems: A systematic literature review","authors":"Luka Lelovic , Austin Huzinga , Gabriel Goulis , Anshpreet Kaur , Ricardo Boone , Umidjon Muzrapov , Amr S. Abdelfattah , Tomas Cerny","doi":"10.1016/j.jss.2024.112241","DOIUrl":"10.1016/j.jss.2024.112241","url":null,"abstract":"<div><h3>Background:</h3><div>Change impact analysis is crucial in software development, especially when working with large and complex systems. It aims to identify the potential consequences of a change or estimate what needs to be modified to accomplish a change. The importance of such an analysis multiplies in decentralized environments. Microservice systems are decentralized and represent the current industry mainstream for scalable systems.</div></div><div><h3>Objective:</h3><div>While individual microservices intend to be self-contained and independent, certain overlap with other microservices is inevitable since they interact. In the context of microservice systems, changes in one microservice can affect other microservices without a direct connection, leading to ripple effects and extended maintenance efforts. To understand the current state of the art for microservices with respect to the change impact analysis, the objective of this work is to study and analyze existing literature to summarize the evidence and to provide readers with a roadmap to established approaches.</div></div><div><h3>Methods:</h3><div>We conduct a systematic literature review targeting studies related to change impact analysis in microservices. The study considered 1,669 papers and filtered them down to 29 works included in this study.</div></div><div><h3>Results:</h3><div>This manuscript introduces different types of change impacts introduced in the literature. It compares and categorizes tools and methods that have been used in literature to measure the impact of change in microservices. It illustrates what units of measure have been used. Finally, it shares system benchmarks used to assess change impact analysis methods.</div></div><div><h3>Open Challenges.</h3><div>A number of open challenges and gaps are found in the tools and methods. These challenges are related to the improvement of the detection techniques, additional impact analysis, and validation of approaches. Many of the solutions measure a specific aspect of an impacted system, without taking into account multiple effects. Impact analysis is seen as measured indirectly by these solutions as well, and more direct observation is needed.</div></div><div><h3>Conclusion:</h3><div>The results provide a reference to microservice developers and quality engineers to maintain better-quality systems. With a roadmap to the topic, our researcher peers might easily understand various directions that have been approached on this topic. Finally, this work serves as a reference for development-aiding tools helping to manage microservice system evolution.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112241"},"PeriodicalIF":3.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Authentic interdisciplinary online courses for alternative pathways into computer science","authors":"Lucia Happe, Kai Marquardt","doi":"10.1016/j.jss.2024.112240","DOIUrl":"10.1016/j.jss.2024.112240","url":null,"abstract":"<div><div>The field of computer science (CS) is facing a crucial challenge in broadening participation and embracing diversity, especially among underrepresented gender groups. The presented interdisciplinary educational program is an efficient response to this challenge, designed to catalyze diversity in CS through engagement with complex, interest-driven problems. This paper outlines the program’s structure, elucidates the pedagogical underpinnings, and reflects on the emergent challenges and opportunities. We delve into how the fusion of CS with other academic disciplines can allure a more varied demographic, emphasizing the engagement of female high school students—a demographic pivotally positioned yet significantly untapped in CS. Through a systematic survey analysis, we measure the program’s efficacy in increasing interest in CS and in cultivating an appreciation for its application in addressing real-world, cross-disciplinary challenges. Our findings affirm the program’s success in bridging the engagement gap by leveraging students’ intrinsic interests, thus charting alternative pathways into the CS field. These insights underscore the critical role of interdisciplinary approaches, establishing a new standard for transformative CS educational methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112240"},"PeriodicalIF":3.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Di Wang, Matthias Galster, Miguel Morales-Trujillo
{"title":"Information needs in bug reports for web applications","authors":"Di Wang, Matthias Galster, Miguel Morales-Trujillo","doi":"10.1016/j.jss.2024.112230","DOIUrl":"10.1016/j.jss.2024.112230","url":null,"abstract":"<div><div>Given the widespread popularity and increasing reliance on long-lived web applications (such as Netflix and Facebook), effective and efficient bug reproduction is essential to maintain functionality and user satisfaction throughout the application’s lifetime. Developers use bug reports to localize, reproduce, and eventually fix software bugs. However, the content of bug reports is not always helpful (e.g., due to incomplete or missing information). In this study, we explore what type of information is often missing in bug reports and how that information is presented in them. We manually analyzed the initial and final versions of 1000 bug reports from 10 popular open-source web-based applications. The analysis revealed that, regardless of the type of software (e.g., e-commerce software or personal tools), diagnostic suggestions from developers and end-user usage information are often missing in initial bug reports but only added later throughout the lifetime of a bug report. Also, textual descriptions and screenshots are used most to describe bugs, regardless of the type of bug (e.g., a functional or performance error). The study highlighted the need for improved bug reporting templates and tools to improve bug report quality and efficiency in web application development and maintenance.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112230"},"PeriodicalIF":3.7,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}