Guang Yang , Yu Zhou , Xiangyu Zhang , Xiang Chen , Tingting Han , Taolue Chen
{"title":"Assessing and improving syntactic adversarial robustness of pre-trained models for code translation","authors":"Guang Yang , Yu Zhou , Xiangyu Zhang , Xiang Chen , Tingting Han , Taolue Chen","doi":"10.1016/j.infsof.2025.107699","DOIUrl":"10.1016/j.infsof.2025.107699","url":null,"abstract":"<div><h3>Context:</h3><div>Pre-trained models (PTMs) have demonstrated significant potential in automatic code translation. However, the vulnerability of these models in translation tasks, particularly in terms of syntax, has not been extensively investigated.</div></div><div><h3>Objective:</h3><div>To fill this gap, our study aims to propose a novel approach <span>CoTR</span> to assess and improve the syntactic adversarial robustness of PTMs in code translation.</div></div><div><h3>Methods:</h3><div><span>CoTR</span> consists of two components: <span>CoTR-A</span> and <span>CoTR-D</span>. <span>CoTR-A</span> generates adversarial examples by transforming programs, while <span>CoTR-D</span> proposes a semantic distance-based sampling data augmentation method and adversarial training method to improve the model’s robustness and generalization capabilities. The Pass@1 metric is used by <span>CoTR</span> to assess the performance of PTMs, which is more suitable for code translation tasks and offers a more precise evaluation in real-world scenarios.</div></div><div><h3>Results:</h3><div>The effectiveness of <span>CoTR</span> is evaluated through experiments on real-world Java<span><math><mo>↔</mo></math></span>Python datasets. The results demonstrate that <span>CoTR-A</span> can significantly reduce the performance of existing PTMs, while <span>CoTR-D</span> effectively improves the robustness of PTMs.</div></div><div><h3>Conclusion:</h3><div>Our study identifies the limitations of current PTMs, including large language models, in code translation tasks. It highlights the potential of <span>CoTR</span> as an effective solution to enhance the robustness of PTMs for code translation tasks.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107699"},"PeriodicalIF":3.8,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Different and similar perceptions of communication among software developers","authors":"Marc Herrmann, Martin Obaidi, Jil Klünder","doi":"10.1016/j.infsof.2025.107698","DOIUrl":"10.1016/j.infsof.2025.107698","url":null,"abstract":"<div><h3>Context:</h3><div>Software development is a collaborative task involving different persons. Development team members are often diverse in regard to several aspects, including experience, (soft) skills, and communication habits. Different preferences in what adequate communication looks like influence how communication is perceived and interpreted by team members.</div></div><div><h3>Objective:</h3><div>In this paper, we investigate differences and similarities in how software developers with varying levels of experience and skills perceive statements from exemplary software project communication.</div></div><div><h3>Methods:</h3><div>By applying hierarchical cluster analysis on the perception data of 94 software developers, we aim to find groups of developers sharing similar perceptions towards statements from software project communication, and to identify factors that influence this perception.</div></div><div><h3>Results:</h3><div>We contribute the following key findings: (1) We statistically identify two groups of software developers whose perceptions differ significantly for about 65% of statements from software project communication; (2) For a logistic regression model, five polarizing statements suffice to assign each participant to their group; (3) Although there is a significant difference in the communication perception, there are no demographic characteristics that differ notably across the two groups.</div></div><div><h3>Conclusion:</h3><div>From our results, we conclude that different perceptions of software project communication during collaboration within development teams are a potential risk for the teams’ mood and the project success. We outline how our results can serve use cases like the application of sentiment analysis in software engineering and mindful communication in software teams in general.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107698"},"PeriodicalIF":3.8,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formal requirements engineering and large language models: A two-way roadmap","authors":"Alessio Ferrari , Paola Spoletini","doi":"10.1016/j.infsof.2025.107697","DOIUrl":"10.1016/j.infsof.2025.107697","url":null,"abstract":"<div><h3>Context:</h3><div>Large Language Models (LLMs) have made remarkable advancements in emulating human linguistic capabilities, showing potential also in executing various requirements engineering (RE) tasks. However, despite their generally good performance, the adoption of LLM-generated solutions and artefacts prompts concerns about their correctness, fairness, and trustworthiness.</div></div><div><h3>Objective:</h3><div>This paper aims to address the concerns associated with the use of LLMs in RE activities. Specifically, it seeks to develop a roadmap that leverages formal methods (FMs) to provide guarantees of correctness, fairness, and trustworthiness when LLMs are utilised in RE. Symmetrically, it aims to explore how LLMs can be employed to make FMs more accessible.</div></div><div><h3>Methods:</h3><div>We use two sets of examples to show the current limits of FMs when used in software development and of LLMs when used for RE tasks. The highlighted limitations are addressed by proposing two roadmaps grounded in the current literature and technologies.</div></div><div><h3>Results:</h3><div>The proposed examples show the potential and limits of FMs in supporting software development and of LLMs when used for RE tasks. The initial investigation into how these limitations can be overcome has been concretised in two detailed roadmaps for the RE and, more largely, the software engineering community.</div></div><div><h3>Conclusion:</h3><div>The proposed roadmaps offer a promising approach to address the concerns of correctness, fairness, and trustworthiness associated with the use of LLMs in RE tasks through the use of FMs and to enhance the accessibility of FMs by utilising LLMs.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107697"},"PeriodicalIF":3.8,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143463364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Soliman , Michel Albonico , Ivano Malavolta , Andreas Wortmann
{"title":"Mining software repositories for software architecture — A systematic mapping study","authors":"Mohamed Soliman , Michel Albonico , Ivano Malavolta , Andreas Wortmann","doi":"10.1016/j.infsof.2025.107677","DOIUrl":"10.1016/j.infsof.2025.107677","url":null,"abstract":"<div><h3>Context:</h3><div>A growing number of researchers are investigating how Mining Software Repositories (MSR) approaches can support software architecture activities, such as architecture recovery, tactics identification, architectural smell detection, and others. However, as of today, it is difficult to have a clear view of existing research on MSR for software architecture.</div></div><div><h3>Objectives:</h3><div>The objective of this study is to identify, classify, and summarize the state-of-the-art MSR approaches applied to software architecture (MSR4SA).</div></div><div><h3>Methods:</h3><div>This study is designed according to the <em>systematic mapping study</em> research method. Specifically, out of 2442 potentially relevant studies, we systematically identify 151 primary studies where MSR approaches are applied to perform software architecture activities. Then, we rigorously extract relevant data from each primary study and synthesize the obtained results to produce a clear map of reasons for adopting MSR approaches to support architecting activities, used data sources, applied MSR techniques, and captured architectural information.</div></div><div><h3>Results:</h3><div>The major reasons to adopt MSR4SA techniques are about addressing industrial concerns like <em>achieving quality attributes</em> and <em>minimizing practitioners’ efforts</em>. Most MSR4SA studies support architectural analysis, while architectural synthesis and evaluation are not commonly supported in MSR4SA studies. The most frequently mined data sources are <em>source code repositories</em> and <em>issue trackers</em>, which are also commonly mined together. Most of the MSR4SA studies apply more than one mining technique, where the most common MSR techniques are: (<em>source code analysis</em>, <em>model analysis</em>, <em>statistical analysis</em>), (<em>machine learning</em>, <em>NLP</em>). <em>Architectural quality issues</em> and <em>components</em> are the mostly mined type of information.</div></div><div><h3>Conclusion:</h3><div>Our results give a solid foundation for researchers and practitioners towards future research and applications of MSR approaches for software architecture.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107677"},"PeriodicalIF":3.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaonan Li , Qingbao Li , Guimin Zhang , Jinjin Liu , Shudan Yue , Weihua Jiao
{"title":"BinOpLeR: Optimization level recovery from binaries based on rich semantic instruction image and weighted voting","authors":"Xiaonan Li , Qingbao Li , Guimin Zhang , Jinjin Liu , Shudan Yue , Weihua Jiao","doi":"10.1016/j.infsof.2025.107683","DOIUrl":"10.1016/j.infsof.2025.107683","url":null,"abstract":"<div><h3>Context:</h3><div>Compiler toolchain differences result in binary code diversity, wherein the impacts of different optimization levels on binary code severely constrains the performance improvement of software security detection tasks such as malware detection, software copyright protection, and vulnerability homology detection. However, binaries compiled with different optimization levels often contain numerous identical or similar code fragments, posing severe challenges to recovering the optimization levels from binaries.</div></div><div><h3>Objective:</h3><div>The existing optimization level detection methods based on statistical features have poor generalization capabilities, and those based on automated learning have low detection accuracy due to using coarse-grained instruction normalization. To improve accuracy and generalization capabilities, this paper proposes BinOpLeR, a binary optimization level recovery method based on rich semantic instruction images and weighted voting.</div></div><div><h3>Method:</h3><div>In this paper, we perform fine-grained normalization on disassembly instructions to retain the elements that reflect instruction semantics and code execution characteristics, and utilize the mappings from the ASCII code values of assembly codes to pixel grayscale values to convert functions into grayscale images. Then, a balanced dataset is constructed using the grayscale images of functions to train a convolutional neural network model with adaptive pooling to capture optimization level-related features. Finally, a weighted voting scheme that incorporates prediction probabilities and function lengths is innovatively introduced to infer the optimization levels of binaries.</div></div><div><h3>Results:</h3><div>We evaluate the performance of BinOpLeR on the public dataset of ARM and MIPS binaries using precision, accuracy, recall and F1 score. The results show that BinOpLeR outperforms the comparison methods in prediction performance.</div></div><div><h3>Conclusion:</h3><div>The findings indicate that: BinOpLeR effectively improves the accuracy of the optimization levels recovery from binaries. It exhibits stable performance across different compiler versions. The granularity and normalization significantly influence feature extraction, and function lengths along with prediction probabilities are crucial factors in inferring the optimization level of binaries.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107683"},"PeriodicalIF":3.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification and challenges of non-functional requirements in ML-enabled systems: A systematic literature review","authors":"Vincenzo De Martino, Fabio Palomba","doi":"10.1016/j.infsof.2025.107678","DOIUrl":"10.1016/j.infsof.2025.107678","url":null,"abstract":"<div><h3>Context:</h3><div>Machine learning (ML) is nowadays so pervasive and diffused that virtually no application can avoid its use. Nonetheless, its enormous potential is often tempered by the need to manage non-functional requirements (NFRs) and navigate pressing, contrasting trade-offs.</div></div><div><h3>Objective:</h3><div>In this respect, we notice a lack of systematic synthesis of challenges explicitly tied to achieving and managing NFRs in ML-enabled systems. Such a synthesis may not only provide a comprehensive summary of the state of the art but also drive further research on the analysis, management, and optimization of NFRS of ML-enabled systems.</div></div><div><h3>Method:</h3><div>In this paper, we propose a systematic literature review targeting two key aspects such as (1) the classification of the NFRs investigated so far, and (2) the challenges associated with achieving and managing NFRs in ML-enabled systems during model development Through the combination of well-established guidelines for conducting systematic literature reviews and additional search criteria, we survey a total amount of 130 research articles.</div></div><div><h3>Results:</h3><div>Our findings report that current research identified 31 different NFRs, which can be grouped into six main classes. We also compiled a catalog of 26 software engineering challenges, emphasizing the need for further research to systematically address, prioritize, and balance NFRs in ML-enabled systems.</div></div><div><h3>Conclusion:</h3><div>We conclude our work by distilling implications and a future outlook on the topic.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107678"},"PeriodicalIF":3.8,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143429791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Production and test bug report classification based on transfer learning","authors":"Misoo Kim , Youngkyoung Kim , Eunseok Lee","doi":"10.1016/j.infsof.2025.107685","DOIUrl":"10.1016/j.infsof.2025.107685","url":null,"abstract":"<div><h3>Context:</h3><div>Recent studies indicate that the classification of production and test bug reports can substantially enhance the accuracy of performance evaluation and the effectiveness of information retrieval–based bug localization (IRBL) for software reliability.</div></div><div><h3>Objective:</h3><div>However, manually classifying these bug reports is time-consuming for developers. This study introduces a production and test bug report classification (ProTeC) framework for automatically classifying these reports.</div></div><div><h3>Methods:</h3><div>The framework’s novelty lies in leveraging a set of production- and test-source files and employing transfer learning to address the issue of insufficient and sparse bug reports in machine-learning applications. The ProTeC framework trains and fine-tunes a source file classifier to develop a bug report classifier by transferring production-test distinguishing knowledge.</div></div><div><h3>Results:</h3><div>To validate the effectiveness and general practicality of ProTeC, we conducted large-scale experiments using 2,522 bug reports across 12 machine/deep learning model variations to train an automatic classifier. Our results, on average, demonstrate that ProTeC’s macro F1-score is 28.6% higher than that of a bug report-based classifier, and it can improve the mean average precision of IRBL by 17.6%.</div></div><div><h3>Conclusion:</h3><div>These positive trends were observed in most model variations, indicating that ProTeC consistently performs well in classifying bug reports regardless of the model used, thereby improving IRBL performance.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107685"},"PeriodicalIF":3.8,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143387324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vulnerability detection with feature fusion and learnable edge-type embedding graph neural network","authors":"Ge Cheng , Qifan Luo , Yun Zhang","doi":"10.1016/j.infsof.2025.107686","DOIUrl":"10.1016/j.infsof.2025.107686","url":null,"abstract":"<div><div>Deep learning methods are widely employed in vulnerability detection, and graph neural networks have shown effectiveness in learning source code representation. However, current methods overlook non-relevant noise information in the code property graph and lack specific graph neural networks designed for code property graph. To address these issues, this paper introduces Leev, an automated vulnerability detection method. We developed a graph neural network tailored to the code property graph, assigning iterative vectors to diverse edge types and integrating them into the message passing between nodes to enable the model to extract hidden vulnerability information. In addition, virtual nodes are incorporated into the graph for feature fusion, mitigating the impact of irrelevant features on vulnerability information within the code. Specifically, for the FFMPeg+Qemu, Reveal, and Fan et al. datasets, the F1 metrics exhibited improvements of 7.02%, 21.69%, and 27.74% over the best baseline, correspondingly.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107686"},"PeriodicalIF":3.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143402661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical assessment of the e-commerce multivariant user interface","authors":"Adam Wasilewski , Elżbieta Pawełek-Lubera","doi":"10.1016/j.infsof.2025.107684","DOIUrl":"10.1016/j.infsof.2025.107684","url":null,"abstract":"<div><h3>Context:</h3><div>Personalization is recognized as one of the key trends in e-commerce development, often including the personalization of offers and prices. However, a rarely used and underestimated personalization opportunity is the customization of the user interface provided to customers. Customers of e-shops differ in their behaviors and usage patterns, yet there is no clear evidence verifying the potential of the user interface to influence the performance indicators of e-shops.</div></div><div><h3>Objective:</h3><div>The research discussed in this paper aims to verify the impact of a dedicated interface on the most common indicators describing e-commerce performance and to identify limitations to the use of user interface personalization in e-commerce.</div></div><div><h3>Method:</h3><div>To achieve this, a solution was developed to collect information about e-commerce customer behavior, segment customers using clustering methods, and provide a dedicated user interface. During the pilot implementation, data was collected to verify the impact of the dedicated interface on the purchasing behavior of customer groups.</div></div><div><h3>Results:</h3><div>The results showed that a dedicated interface can significantly improve the conversion rate(by 46% in the analyzed group) and average order value (11%).</div></div><div><h3>Conclusion:</h3><div>These findings confirm that tailored UI variants can positively influence customer behavior in e-shops by increasing key performance indicators.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107684"},"PeriodicalIF":3.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hannah Deters, Jakob Droste, Martin Obaidi, Kurt Schneider
{"title":"Exploring the means to measure explainability: Metrics, heuristics and questionnaires","authors":"Hannah Deters, Jakob Droste, Martin Obaidi, Kurt Schneider","doi":"10.1016/j.infsof.2025.107682","DOIUrl":"10.1016/j.infsof.2025.107682","url":null,"abstract":"<div><h3>Context:</h3><div>As the complexity of modern software is steadily growing, these systems become increasingly difficult to understand for their stakeholders. At the same time, opaque and artificially intelligent systems permeate a growing number of safety-critical areas, such as medicine and finance. As a result, explainability is becoming more important as a software quality aspect and non-functional requirement.</div></div><div><h3>Objective:</h3><div>Contemporary research has mainly focused on making artificial intelligence and its decision-making processes more understandable. However, explainability has also gained traction in recent requirements engineering research. This work aims to contribute to that body of research by providing a quality model for explainability as a software quality aspect. Quality models provide means and measures to specify and evaluate quality requirements.</div></div><div><h3>Method:</h3><div>In order to design a user-centered quality model for explainability, we conducted a literature review.</div></div><div><h3>Results:</h3><div>We identified ten fundamental aspects of explainability. Furthermore, we aggregated criteria and metrics to measure them as well as alternative means of evaluation in the form of heuristics and questionnaires.</div></div><div><h3>Conclusion:</h3><div>Our quality model and the related means of evaluation enable software engineers to develop and validate explainable systems in accordance with their explainability goals and intentions. This is achieved by offering a view from different angles at fundamental aspects of explainability and the related development goals. Thus, we provide a foundation that improves the management and verification of explainability requirements.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107682"},"PeriodicalIF":3.8,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143402662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}