Information and Software Technology最新文献

筛选
英文 中文
Promoting social sustainability within software development through the lens of organizational readiness for change theory 通过变更理论的组织准备来促进软件开发中的社会可持续性
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-26 DOI: 10.1016/j.infsof.2025.107755
Ana Carolina Moises de Souza, Daniela Soares Cruzes, Letizia Jaccheri, Tangni Cunningham Dahl-Jørgensen
{"title":"Promoting social sustainability within software development through the lens of organizational readiness for change theory","authors":"Ana Carolina Moises de Souza,&nbsp;Daniela Soares Cruzes,&nbsp;Letizia Jaccheri,&nbsp;Tangni Cunningham Dahl-Jørgensen","doi":"10.1016/j.infsof.2025.107755","DOIUrl":"10.1016/j.infsof.2025.107755","url":null,"abstract":"<div><h3>Context:</h3><div>Software’s negative impact on society underscores the need to integrate social sustainability into software development. However, effective implementation and practitioners’ readiness for this change remain unclear, requiring further investigation.</div></div><div><h3>Objective:</h3><div>This research aims to understand the conditions that promote organizational readiness for change in the integration of social sustainability into software development from the perspective of software practitioners.</div></div><div><h3>Methods:</h3><div>We conducted multiple case studies containing three cases: (A) an exploratory study with 11 practitioners from four organizations; (B) the proposal and validation of a Walkthrough intervention with 9 students (pilot) and 19 practitioners (questionnaire); and (C) a focus group with 6 practitioners in one organization providing feedback on the Walkthrough.</div></div><div><h3>Results:</h3><div>Four facilitators and barriers were identified as key preconditions for social sustainability integration. Statistical analysis showed that the perceived usefulness of the Walkthrough was significantly higher than intentional behavior indicating strong perceived value despite moderate intention to adopt the practices.</div></div><div><h3>Conclusion:</h3><div>This study identified the key determinants that promote organizational readiness to integrate social sustainability into software development. By proposing a conceptual model, it contributes to helping organizations leverage facilitators, overcome barriers, and offer actionable recommendations for both practice and research.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"184 ","pages":"Article 107755"},"PeriodicalIF":3.8,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143891910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking large language models for automated labeling: The case of issue report classification 对自动标记的大型语言模型进行基准测试:问题报告分类的案例
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-25 DOI: 10.1016/j.infsof.2025.107758
Giuseppe Colavito, Filippo Lanubile, Nicole Novielli
{"title":"Benchmarking large language models for automated labeling: The case of issue report classification","authors":"Giuseppe Colavito,&nbsp;Filippo Lanubile,&nbsp;Nicole Novielli","doi":"10.1016/j.infsof.2025.107758","DOIUrl":"10.1016/j.infsof.2025.107758","url":null,"abstract":"<div><h3>Context:</h3><div>Issue labeling is a fundamental task for software development as it is critical for the effective management of software projects. This practice involves assigning a label to issues, such as <em>bug</em> or feature request, denoting a task relevant to the project management. To date, large language models (LLMs) have been proposed to automate this task, including both fine-tuned BERT-like models and zero-shot GPT-like models.</div></div><div><h3>Objectives:</h3><div>In this paper, we investigate which LLMs offer the best trade-off between performance, response time, hardware requirements, and quality of the responses for issue report classification.</div></div><div><h3>Methods:</h3><div>We design and execute a comprehensive benchmark study to assess 22 generative decoder-only LLMs and 2 baseline BERT-like encoder-only models, which we evaluate on two different datasets of GitHub issues.</div></div><div><h3>Results:</h3><div>Generative LLMs demonstrate potential for zero-shot classification. However, their performance varies significantly across datasets and they require substantial computational resources for deployment. In contrast, BERT-like models show more consistent performance and lower resource requirements.</div></div><div><h3>Conclusions:</h3><div>Based on the empirical evidence provided in this study, we discuss implications for researchers and practitioners. In particular, our results suggest that fine-tuning BERT-like encoder-only models enables achieving consistent, state-of-the-art performance across datasets even in presence of a small amount of labeled data available for training.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"184 ","pages":"Article 107758"},"PeriodicalIF":3.8,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143891909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data preprocessing for machine learning based code smell detection: A systematic literature review 基于机器学习的代码气味检测的数据预处理:系统的文献综述
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-25 DOI: 10.1016/j.infsof.2025.107752
Fábio do Rosario Santos, Ricardo Choren
{"title":"Data preprocessing for machine learning based code smell detection: A systematic literature review","authors":"Fábio do Rosario Santos,&nbsp;Ricardo Choren","doi":"10.1016/j.infsof.2025.107752","DOIUrl":"10.1016/j.infsof.2025.107752","url":null,"abstract":"<div><h3>Context:</h3><div>Detecting code smells using Machine Learning presents inherent challenges due to the unbalanced nature of the problem and susceptibility to interpretation biases. It is a data-driven process for code quality assurance that aims to detect if a given piece of code presents a fundamental design principles violation that negatively impacts design quality. Researchers in the field have been advised to carefully analyze the internal mechanisms of forecasting models before interpreting the results generated by them.</div></div><div><h3>Objective:</h3><div>The review aims to summarize and synthesize studies that utilized Data Preprocessing techniques for Machine Learning-based code smell detection. And also, to investigate the relationship between Data Preprocessing and more advanced Machine Learning techniques, i.e., Ensemble Methods, Deep Learning, and Transfer Learning.</div></div><div><h3>Method:</h3><div>To obtain insights into Data Preprocessing for Machine Learning-based code smell detection solutions, we employed a systematic approach, identifying and analyzing 69 studies published up to November 2023.</div></div><div><h3>Results:</h3><div>In Data Preprocessing, Data Balancing techniques, Feature Selection techniques, and Filtering emerged as prominent strategies. SMOTE was the most frequently used Data Balancing technique, while Autoencoder, Chi-square, Gain Ratio, Information Gain, PCA, and CFS were notable choices for Feature Selection. Tokenization and Syntax Trees were commonly paired with Deep Learning or Transfer Learning methods. Normalization and Standardization were implemented for Data Scaling. Regarding Machine Learning techniques used with Data Preprocessing, 46% of the combinations occurred with at least one Ensemble Method. Deep Learning was employed in 37% of cases. Data Balancing techniques combined with Deep Learning (32%) or Ensemble Methods (19%) were used most.</div></div><div><h3>Conclusion:</h3><div>The findings of this SLR are an integrated and comprehensive source of information regarding data preparation practices, challenges, and solutions for Machine Learning-based code smell detection, emphasizing the continuous endeavor towards more resilient, contextually sensitive, and developer-informed strategies within this dynamic field.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"184 ","pages":"Article 107752"},"PeriodicalIF":3.8,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143879411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic literature review on task recommendation systems for crowdsourced software engineering 众包软件工程任务推荐系统的文献综述
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-22 DOI: 10.1016/j.infsof.2025.107753
Shashiwadana Nirmani , Mojtaba Shahin , Hourieh Khalajzadeh , Xiao Liu
{"title":"A systematic literature review on task recommendation systems for crowdsourced software engineering","authors":"Shashiwadana Nirmani ,&nbsp;Mojtaba Shahin ,&nbsp;Hourieh Khalajzadeh ,&nbsp;Xiao Liu","doi":"10.1016/j.infsof.2025.107753","DOIUrl":"10.1016/j.infsof.2025.107753","url":null,"abstract":"<div><h3>Context:</h3><div>Crowdsourced Software Engineering (CSE) offers outsourcing work to software practitioners by leveraging a global online workforce. However, these software practitioners struggle to identify suitable tasks due to the variety of options available. Hence, there have been a growing number of studies on introducing recommendation systems to recommend CSE tasks to software practitioners.</div></div><div><h3>Objective:</h3><div>The goal of this study is to analyze the existing CSE task recommendation systems, investigating their extracted data, recommendation methods, key advantages and limitations, recommended task types, the use of human factors in recommendations, popular platforms, and features used to make recommendations.</div></div><div><h3>Methods:</h3><div>This SLR was conducted according to the Kitchenham and Charters’ guidelines. We used manual and automatic search strategies without putting any time limitation for searching the relevant papers.</div></div><div><h3>Results:</h3><div>We selected 65 primary studies for data extraction, analysis, and synthesis based on our predefined inclusion and exclusion criteria. Based on our data analysis results, we classified the extracted information into four categories according to the data acquisition sources: Software Practitioner’s Profile, Task or Project, Previous Contributions, and Direct Data Collection. We also organized the proposed recommendation systems into a taxonomy and identified key advantages, such as increased performance, accuracy, and optimized solutions. In addition, we identified the limitations of these systems, such as inadequate or biased recommendations and lack of generalizability. Our results revealed that human factors play a major role in CSE task recommendation. Further, we identified five popular task types recommended, popular platforms, and their features used in task recommendation. We also provided recommendations for future research directions.</div></div><div><h3>Conclusion:</h3><div>This SLR provides insights into current trends, gaps, and future research directions in CSE task recommendation systems such as the need for comprehensive evaluation, standardized evaluation metrics, and benchmarking in future studies, transferring knowledge from other platforms to address cold start problem.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"184 ","pages":"Article 107753"},"PeriodicalIF":3.8,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fundamental requirements of Digital Twins for production system in Oil and Gas Industry: A systematic literature review 石油天然气行业生产系统数字孪生系统的基本要求:系统文献综述
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-15 DOI: 10.1016/j.infsof.2025.107742
Ricardo C. Belo, Marcelo S. Pimenta, Tarciso T. Salvador, Rafael H. Petry, Mara Abel
{"title":"Fundamental requirements of Digital Twins for production system in Oil and Gas Industry: A systematic literature review","authors":"Ricardo C. Belo,&nbsp;Marcelo S. Pimenta,&nbsp;Tarciso T. Salvador,&nbsp;Rafael H. Petry,&nbsp;Mara Abel","doi":"10.1016/j.infsof.2025.107742","DOIUrl":"10.1016/j.infsof.2025.107742","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Context:&lt;/h3&gt;&lt;div&gt;The oil and gas industry is adopting Digital Twins as a significant step in a continuous digital transformation. A Digital Twin can provide intelligent support to main activities related directly or indirectly to oil and gas production, like operations monitoring, process optimization, failure prediction, simulation of what-if scenarios, and safety improvement.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Situation:&lt;/h3&gt;&lt;div&gt;Specifications of requirements of a Digital Twin (DT) in the oil and gas domain found in the literature are usually presented informally, utilizing natural and often ambiguous language. Most of the requirements need to be extracted from descriptions of DT characteristics and functionality presented in articles.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Objective:&lt;/h3&gt;&lt;div&gt;This systematic literature review aims to summarize the existing evidence concerning the requirements of Digital Twins tailored explicitly for oil and gas production systems. By thoroughly analyzing published literature, the study seeks to uncover the requirements, properties, and constraints essential for the successful implementation of Digital Twins in this domain.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Method:&lt;/h3&gt;&lt;div&gt;Through a systematic literature review, the study focused on rigorously identifying common functionality, ubiquitous characteristics, and some emerging trends related to Digital Twin requirements in oil and gas production systems.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results:&lt;/h3&gt;&lt;div&gt;From the initial 939 articles, the review selected 94 relevant studies, focusing on described requirements and on application-specific features of Digital Twins. Among the selected papers, 28 were analyzed and reviewed, focusing on specific requirements for Digital Twin for production systems within the industry, shedding light on 17 functional and 7 non-functional requirements common to many DT specifications and implementations.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion:&lt;/h3&gt;&lt;div&gt;Our findings underscore the importance of comprehensively understanding and outlining the essential requirements for Digital Twins within the intricate landscape of production systems in the industry. By elucidating key features and properties of DT, this study contributes significantly to enhancing the efficacy and implementation of new Digital Twins, or the evaluation of existing Digital Twins.&lt;/div&gt;&lt;div&gt;As a result, we have identified some important requirements, specifically in the O&amp;G domain. We analyzed some issues related to the software needs of DTs in the O&amp;G domain, highlighting which are the requirements of a DT usually specified or informally described. This study allows us to identify primary studies in both DT for O&amp;G and Requirements Engineering (RE) fields. Even though the requirements described here have been collected from DT works in the O&amp;G domain, many of these requirements are also applicable to other domains, like many areas of engineering and manufacturing. Finally, it aims to offer a clear understanding of ","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"184 ","pages":"Article 107742"},"PeriodicalIF":3.8,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SemiRALD: A semi-supervised hybrid language model for robust Anomalous Log Detection 半监督混合语言模型用于鲁棒异常日志检测
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-11 DOI: 10.1016/j.infsof.2025.107743
Yicheng Sun , Jacky Wai Keung , Zhen Yang , Shuo Liu , Hi Kuen Yu
{"title":"SemiRALD: A semi-supervised hybrid language model for robust Anomalous Log Detection","authors":"Yicheng Sun ,&nbsp;Jacky Wai Keung ,&nbsp;Zhen Yang ,&nbsp;Shuo Liu ,&nbsp;Hi Kuen Yu","doi":"10.1016/j.infsof.2025.107743","DOIUrl":"10.1016/j.infsof.2025.107743","url":null,"abstract":"<div><h3>Context:</h3><div>Deep learning-based Anomalous Log Detection (DALD) tools are critical for software reliability, but current approaches face challenges, including information loss during log parsing, reliance on large labeled datasets, and fragility in low-resource scenarios.</div></div><div><h3>Objective:</h3><div>To overcome the above limitations, we propose SemiRALD, a semi-supervised learning-based robust ALD approach that leverages Large Language Model (LLM) for log parsing, enhancing both flexibility and accuracy. It utilizes a hybrid language model to repeatedly fit the samples with generate pseudo-labels, thereby training DALD models with limited resources and facilitating efficient anomaly detection tasks.</div></div><div><h3>Method:</h3><div>In detail, SemiRALD utilizes ChatGPT and in-context learning for automated log parsing, thereby improving the log integrity during log parsing. Subsequently, it harnesses a semi-supervised learning framework and our proposed hybrid language model to remedy the performance degeneration caused by low-resource restriction in practice. Semi-supervised learning requires only a small amount of labeled data throughout the entire process, while the hybrid language model is built on the architecture of RoBERTa and an attention-based BiLSTM.</div></div><div><h3>Results:</h3><div>Experiments on the HDFS and BGL datasets demonstrate that SemiRALD achieves an average F1-score improvement of 7.3% and 8.2%, respectively, over seven benchmark models. On small-scale datasets (0.1% of the original size), SemiRALD outperforms competitors by 31.4% and 46.0% in F1-score, respectively. Its consistent performance across diverse datasets highlights its generalizability and robustness.</div></div><div><h3>Conclusion:</h3><div>SemiRALD is capable of handling anomaly detection tasks in both large-scale and low-resource datasets, delivering significant advancements in anomaly log detection and offering robust, adaptable solutions to address prevalent challenges in the field of software reliability engineering.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"183 ","pages":"Article 107743"},"PeriodicalIF":3.8,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validation of information architecture: Cross-methodological comparison of tree testing variants and prototype user testing 信息架构的验证:树形测试变体和原型用户测试的跨方法比较
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-10 DOI: 10.1016/j.infsof.2025.107740
Eduard Kuric , Peter Demcak , Matus Krajcovic
{"title":"Validation of information architecture: Cross-methodological comparison of tree testing variants and prototype user testing","authors":"Eduard Kuric ,&nbsp;Peter Demcak ,&nbsp;Matus Krajcovic","doi":"10.1016/j.infsof.2025.107740","DOIUrl":"10.1016/j.infsof.2025.107740","url":null,"abstract":"<div><h3>Context:</h3><div>Tree testing is an established user testing method applied by software professionals to validate that an information architecture is logically navigable by users. We identify a methodological gap caused by previously unexamined non-uniformity between tree testing methods and software.</div></div><div><h3>Objective:</h3><div>To reveal the role of the user interface representations in tree testing, this research compares the results of 3 commonly-used tree testing variants. To assess how indicative they are of the user’s interaction with an information architecture implemented in an actual user interface, and to issue methodological recommendations, comparison with varied high-fidelity prototypes was performed.</div></div><div><h3>Methods:</h3><div>Two between-subject studies were conducted to obtain a new dataset of users navigating an information architecture in tree testing and in interactive user interface prototypes. Data from 180 participants and 1800 task completions between 6 experimental conditions—3 tree testing and 3 prototype user interface variants—was evaluated quantitatively and qualitatively.</div></div><div><h3>Results:</h3><div>Significant differences were found between results yielded by different tree testing method variants, and in how well they approximate user navigation in the same information architecture in high-fidelity prototypes. Implications for selection of the tree testing variant are proposed in the context of evaluated information architecture, with plausible broader applicability for tree testing methodology. Evidence supports the tree testing variant with highest visibility of previous navigation choices and direct controls over their reversal as the most accurate.</div></div><div><h3>Conclusion:</h3><div>Presented findings can contribute to the design of software information architecture based on more accurate early validation, owing to tree testing that simulates less artificial user behavior more reflective of the user’s navigation in the eventual user interface. We hope this will further the discussion and research leading to more holistic tree testing methodologies in the future.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"183 ","pages":"Article 107740"},"PeriodicalIF":3.8,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143817124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RaxCS: Towards cross-language code summarization with contrastive pre-training and retrieval augmentation 基于对比预训练和检索增强的跨语言代码摘要
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-10 DOI: 10.1016/j.infsof.2025.107741
Kaiyuan Yang , Junfeng Wang , Zihua Song
{"title":"RaxCS: Towards cross-language code summarization with contrastive pre-training and retrieval augmentation","authors":"Kaiyuan Yang ,&nbsp;Junfeng Wang ,&nbsp;Zihua Song","doi":"10.1016/j.infsof.2025.107741","DOIUrl":"10.1016/j.infsof.2025.107741","url":null,"abstract":"<div><h3>Context:</h3><div>Code summarization is the task of generating a concise natural language description of the code snippet. Recent efforts have been made to boost the performance of code summarization language from various perspectives, e.g., retrieving external information or introducing large transformer-based models, and thus has achieved promising performance for one specific programming language. While dealing with rapidly expanded cross-language source code datasets, existing approaches suffer from two issues, (1) the difficulty of building a universe code representation for multiple languages; (2) less-well performance for low-resource language.</div></div><div><h3>Objective:</h3><div>To cope with these issues, we propose a novel code summarization approach named RaxCS, which aims to perform code summarization across multiple languages and improve accuracy for low-resource languages by leveraging cross-language knowledge.</div></div><div><h3>Methods:</h3><div>We exploit the pre-trained models with the contrastive learning objective to build a unified code representation towards multiple languages. To fully mine the external knowledge across programming languages, we design a hybrid retrieval module to search functionally equivalent code and its corresponding comment to serve as preliminary information. Finally, we employ a decode-only transformer model to fuse contextual information, which guides the process of generating summaries.</div></div><div><h3>Results:</h3><div>Extensive experiments demonstrate (1) RaxCS outperforms the state-of-the-art on cross-language code summarization (i.e., RaxCS scores 4.39% higher in terms of BLEU metric and 8.65% in terms of BERTScore). (2) For low-resource languages, RaxCS can boost the code summarization performance by a significant magnification (e.g., 6.93% in terms of BLEU for ruby) with cross-language retrieval.</div></div><div><h3>Conclusion:</h3><div>This paper introduces a cross-language code summarization model, which utilizes contrastive pre-training and cross-language retrieval. Both are beneficial for incorporating cross-language knowledge to advance code summarization performance. The experimental results demonstrate that RaxCS is effective in generating accurate code summaries, particularly for low-resource languages.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"183 ","pages":"Article 107741"},"PeriodicalIF":3.8,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143820618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special issue on causal modeling and inference in SE SE因果建模与推理专刊
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-10 DOI: 10.1016/j.infsof.2025.107754
Julien Siebert , Adam Trendowicz , Gregor Gössler , Hironori Washizaki , Michael Kläs , Martin Shepperd
{"title":"Special issue on causal modeling and inference in SE","authors":"Julien Siebert ,&nbsp;Adam Trendowicz ,&nbsp;Gregor Gössler ,&nbsp;Hironori Washizaki ,&nbsp;Michael Kläs ,&nbsp;Martin Shepperd","doi":"10.1016/j.infsof.2025.107754","DOIUrl":"10.1016/j.infsof.2025.107754","url":null,"abstract":"","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"183 ","pages":"Article 107754"},"PeriodicalIF":3.8,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143898867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VDMAF: Cross-language source code vulnerability detection using multi-head attention fusion VDMAF:使用多头注意力融合的跨语言源代码漏洞检测
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-04-09 DOI: 10.1016/j.infsof.2025.107739
Yang Li , Qin Luo , Peng Wu , Hongdi Zheng
{"title":"VDMAF: Cross-language source code vulnerability detection using multi-head attention fusion","authors":"Yang Li ,&nbsp;Qin Luo ,&nbsp;Peng Wu ,&nbsp;Hongdi Zheng","doi":"10.1016/j.infsof.2025.107739","DOIUrl":"10.1016/j.infsof.2025.107739","url":null,"abstract":"<div><h3>Context:</h3><div>Detecting potential vulnerabilities is critical for ensuring the stability and reliability of software systems. Traditional static detection methods fall short in accuracy and efficiency. Furthermore, existing deep learning-based vulnerability detection models typically rely on single sequence or graph embedding methods, neglecting the semantic and structured information present in the code. With the diversification of software development environments, systems often involve multiple programming languages. This limits the effectiveness of existing vulnerability detection methods when handling cross-language code.</div></div><div><h3>Objective:</h3><div>To solve these problems, we propose a more effective and general vulnerability detection framework, VDMAF(Cross-Language Source Code Vulnerability Detection Using Multi-Head Attention Fusion).</div></div><div><h3>Methods:</h3><div>The method extracts unified and standardized feature representations. It uses a multi-head attention module to fuse sequence features and graph structural features. First, an improved global consistent labeling mechanism is introduced, which improves data representation through threshold-based label augmentation. Second, the method uses sequence embedding to extract local semantic features of the code. The code is converted into a unified, standardized graph structure. Then, a graph neural network is used to extract features. Finally, the sequence and graph features are fused using the multi-head attention module, followed by classification with a bidirectional LSTM-based recurrent neural network.</div></div><div><h3>Results:</h3><div>VDMAF has been evaluated on three vulnerability datasets across different programming languages and granularities, demonstrating better performance across all metrics compared to baseline models, with F1 scores of 98.9%, 65.3%, and 56.8%.</div></div><div><h3>Conclusion:</h3><div>The proposed VDMAF outperforms state-of-the-art models, exhibiting better generality and scalability, thus showing greater potential in vulnerability detection tasks.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"183 ","pages":"Article 107739"},"PeriodicalIF":3.8,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信