{"title":"Community engagement and the lifespan of open-source software projects","authors":"Mohit, Kuljit Kaur Chahal","doi":"10.1016/j.infsof.2025.107914","DOIUrl":"10.1016/j.infsof.2025.107914","url":null,"abstract":"<div><h3>Context:</h3><div>Open-source software (OSS) projects depend on community engagement (CE) for longevity. However, CE’s quantifiable impact on project dynamics and lifespan is underexplored.</div></div><div><h3>Objectives:</h3><div>This study defines CE in OSS, identifies key metrics, and evaluates their influence on project dynamics (releases, commits, branches) and lifespan.</div></div><div><h3>Methods:</h3><div>We analyzed 33,946 GitHub repositories, defining and operationalizing CE with validated per-month metrics (issues, comments, watchers, stargazers). Non-parametric tests and correlations assessed relationships with project dynamics and lifespan across quartiles.</div></div><div><h3>Results:</h3><div>CE metrics significantly associate with project dynamics, with stronger correlations in highly engaged projects. For lifespan, a complex pattern emerged: per-month CE rates are highest in younger projects, declining with age. Yet, a subset of long-lived projects maintains exceptionally high activity. Initial CE bursts appear crucial for establishment, while sustained high engagement drives extreme longevity. Active issue engagement’s influence intensifies with age, but passive attention’s declines.</div></div><div><h3>Conclusion:</h3><div>CE dynamically drives OSS project longevity and development. Our findings establish validated CE metrics and offer deeper insights into how diverse community activity patterns contribute to project longevity.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107914"},"PeriodicalIF":4.3,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145322332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ADTDroid: Leveraging API description and TCP based active learning for Android malware detection","authors":"Zhen Liu , Ruoyu Wang , Wenbin Zhang","doi":"10.1016/j.infsof.2025.107930","DOIUrl":"10.1016/j.infsof.2025.107930","url":null,"abstract":"<div><h3>Context:</h3><div>Extensive research has been conducted on neural network-based Android malware detection models to safeguard the Android software ecosystem. However, the efficacy of detection models may decline over time due to the continuous evolution of malicious behaviors, a phenomenon referred to as the model aging problem.</div></div><div><h3>Objective:</h3><div>To tackle this problem, existing researches primarily focus on API semantic feature learning and active learning. However, a major challenge in feature learning is the continuous updating of APIs. Additionally, the over-confidence problem in neural networks exacerbates the challenge of selecting uncertain samples during active learning. To handle these challenges, this paper proposes a novel android malware detection method called ADTDroid. It aims to enhance the performance of malware detection model against the ongoing API updating and malware evolution.</div></div><div><h3>Method:</h3><div>In this paper, we present a sensitive event graph based feature extraction approach that prioritizes suspicious APIs. To derive API embeddings for feature vector extraction, we propose learning these embeddings directly from API descriptions provided in official Android development documentation. This method facilitates the immediate acquisition of embeddings for updated APIs from the documentation. Furthermore, we propose a True Class Probability(TCP)-based confidence score to identify uncertain samples for model retraining. These samples exhibit genuine uncertainty, thereby enhancing the model’s adaptability to evolving data.</div></div><div><h3>Results:</h3><div>Through extensive experimentation on large-scale real-world datasets covering the period from 2013 to 2022, our method achieves significant improvements in the F-score of malware detection. Compared to existing active learning-based approaches, our method achieves relative improvements of approximately 10% over APIGraph and 8.1% over contrastive autoencoder techniques.</div></div><div><h3>Conclusion:</h3><div>ADTDroid can enhance the performance of feature extraction in cases of model aging. It can also improve the selection of uncertain samples to adapt the malware detection model to new data.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107930"},"PeriodicalIF":4.3,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Psycholinguistic analyses in software engineering text: A systematic mapping study","authors":"Amirali Sajadi , Kostadin Damevski , Preetha Chatterjee","doi":"10.1016/j.infsof.2025.107913","DOIUrl":"10.1016/j.infsof.2025.107913","url":null,"abstract":"<div><h3>Context:</h3><div>A deeper understanding of human factors in software engineering (SE) is essential for improving team collaboration, decision-making, and productivity. Communication channels like code reviews and chats provide insights into developers’ psychological and emotional states. While large language models excel at text analysis, they often lack transparency and precision. Psycholinguistic tools like Linguistic Inquiry and Word Count (LIWC) offer clearer, interpretable insights into cognitive and emotional processes exhibited in text. Despite its wide use in SE research, no comprehensive mapping study of LIWC’s use has been conducted.</div></div><div><h3>Objective:</h3><div>We examine the importance of psycholinguistic tools, particularly LIWC, and provide a thorough analysis of its current and potential future applications in SE research.</div></div><div><h3>Methods:</h3><div>We conducted a systematic mapping study of six prominent databases, identifying 43 SE-related papers using LIWC. Our analysis focuses on five research questions: <em>RQ1. How was LIWC employed in SE studies, and for what purposes?, RQ2. What datasets were analyzed using LIWC?, RQ3: What Behavioral Software Engineering (BSE) concepts were studied using LIWC? RQ4: How often has LIWC been evaluated in SE research?, RQ5: What concerns were raised about adopting LIWC in SE?</em></div></div><div><h3>Results:</h3><div>Our findings reveal a wide range of applications, including analyzing team communication to detect developer emotions and personality, developing ML models to predict deleted Stack Overflow posts, and more recently comparing AI-generated and human-written text. LIWC has been primarily used with data from project management platforms (e.g., GitHub) and Q&A forums (e.g., Stack Overflow). Key BSE concepts include <em>Communication</em>, <em>Organizational Climate</em>, and <em>Positive Psychology</em>. 26 of 43 papers did not formally evaluate LIWC. Concerns were raised about some limitations, including difficulty handling SE-specific vocabulary.</div></div><div><h3>Conclusion:</h3><div>We highlight the potential of psycholinguistic tools and their limitations, and present new use cases for advancing research on human factors in SE (e.g., bias in human-LLM conversations).</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107913"},"PeriodicalIF":4.3,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145322333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial intelligence for source code understanding tasks: A systematic mapping study","authors":"Dzikri Rahadian Fudholi, Andrea Capiluppi","doi":"10.1016/j.infsof.2025.107915","DOIUrl":"10.1016/j.infsof.2025.107915","url":null,"abstract":"<div><h3>Context:</h3><div>Artificial intelligence (AI) techniques, particularly natural language processing (NLP) and machine learning (ML), are increasingly used to support source code understanding, an essential activity in software engineering.</div></div><div><h3>Objective:</h3><div>This systematic mapping study investigates how these techniques are applied, guided by four Research Questions (RQs) focusing on the types of tasks, embedding methods & preprocessing techniques used, machine learning models employed, and existing research gaps.</div></div><div><h3>Methods:</h3><div>A review of 227 peer-reviewed studies identifies trends and provides a structured mapping addressing each RQ.</div></div><div><h3>Results:</h3><div>The findings reveal a dominant shift toward deep learning, especially transformer-based and graph-based models, highlighting underexplored areas such as explainability.</div></div><div><h3>Conclusion:</h3><div>This study provides a task-based classification and offers insights and directions for future research in AI-enabled source code understanding, supporting both researchers and practitioners.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107915"},"PeriodicalIF":4.3,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reasoned or Rapid code? Unveiling the strengths and limits of DeepSeek for Solidity development","authors":"Gavina Baralla, Giacomo Ibba, Roberto Tonelli","doi":"10.1016/j.infsof.2025.107917","DOIUrl":"10.1016/j.infsof.2025.107917","url":null,"abstract":"<div><h3>Context:</h3><div>As blockchain systems grow in complexity, secure and efficient smart contract development remains a crucial challenge. Large Language Models (LLMs) like DeepSeek promise significant enhancements in developer productivity through automated code generation, debugging, and testing. This study focuses on Solidity, the dominant language for Ethereum smart contracts, where correctness, gas efficiency, and security are critical to real-world adoption.</div></div><div><h3>Objective:</h3><div>This study evaluates the capabilities of DeepSeek’s V3 and R1 models, a non-reasoning Mixture-of-Experts architecture and a reasoning-based model trained via reinforcement learning, respectively, in automating Solidity contract generation and testing, as well as identifying and fixing common vulnerabilities.</div></div><div><h3>Methods:</h3><div>We designed a controlled experimental framework to evaluate both models by generating and analysing a diverse set of smart contracts, including standardised tokens (ERC20, ERC721, ERC1155) and real-world application scenarios (Supply Chain, Token Exchange, Auction). The evaluation is grounded on a multidimensional metric suite covering quality, technical robustness and process characteristics. Vulnerability detection and patching capabilities are tested using predefined vulnerable contracts and guided patch prompts. The analysis spans six levels of prompt complexity and compares the impact of reasoning-based and non-reasoning-based generation strategies.</div></div><div><h3>Results:</h3><div>Findings reveal that R1 delivers more accurate and optimised outputs under high complexity, while V3 performs more consistently in simpler tasks with simpler code structures. However, both models exhibit persistent hallucinations, limitations in vulnerability coverage, and inconsistencies due to prompt formulation. The correlation between re-evaluation patterns and output quality suggests that reasoning helps in complex scenarios, although excessive revisions may lead to over-engineered or unstable solutions.</div></div><div><h3>Conclusions:</h3><div>Neither model is robust enough to autonomously generate issue-free smart contracts in complex or security-critical scenarios, underscoring the need for human oversight. These findings highlight best practices for integrating LLMs into blockchain development workflows and emphasise the importance of aligning model selection with task complexity and security requirements.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107917"},"PeriodicalIF":4.3,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dante D. Sanchez-Gallegos , Diana Carrizales-Espinoza , J.L. Gonzalez-Compean , Marco Antonio Núñez-Gaona , Heriberto Aguirre-Meneses , Jesus Carretero
{"title":"Nez: A design-driven skeleton model for building continuum AI-based and analytic systems","authors":"Dante D. Sanchez-Gallegos , Diana Carrizales-Espinoza , J.L. Gonzalez-Compean , Marco Antonio Núñez-Gaona , Heriberto Aguirre-Meneses , Jesus Carretero","doi":"10.1016/j.infsof.2025.107911","DOIUrl":"10.1016/j.infsof.2025.107911","url":null,"abstract":"<div><h3>Context:</h3><div>Organizations increasingly rely on artificial intelligence (AI) and machine learning (ML) to process data, automate tasks, and enhance decision-making. At the same time, the computing continuum enables AI and ML to be deployed closer to data sources, thereby reducing system latency and response time.</div></div><div><h3>Objective:</h3><div>Managing applications across this distributed environment is challenging due to the need for manual deployment, integration, and compliance with non-functional requirements (NFRs) such as security and fault tolerance. Therefore, there is a need for frameworks that automate the deployment and execution of computing continuum systems while integrating both functional and non-functional requirements.</div></div><div><h3>Method:</h3><div>This paper presents <em>Nez</em>, a design-driven skeleton model for building continuum AI and analytics systems. Nez construction model automatically and transparently integrates AI/ML applications with non-functional components to create continuum systems that are deployed dynamically across multiple distributed infrastructures.</div></div><div><h3>Results:</h3><div>We conducted case studies on the processing of medical imagery and satellite imagery to provide automatic and continuous support for decision-makers. Nez has already been deployed at the Mexican hospital, <em>Instituto Nacional de Rehabilitación Luis Gerardo Ibarra Ibarra</em>, to create an AI-based data flow supporting bone cancer diagnosis. The evaluation shows that Nez outperforms state-of-the-art tools such as Nextflow, Makeflow, and Parsl, achieving improvements in response time of 28.46%, 17.46%, and 23.54%, respectively.</div></div><div><h3>Conclusion:</h3><div>Nez efficiently transforms organizational data flow designs into continuum computing services. This enables organizations to construct continuum AI-based and analytical systems that account for both functional and non-functional requirements.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107911"},"PeriodicalIF":4.3,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prune bias from the root: Bias removal and fairness estimation by pruning sensitive attributes in pre-trained DNN models","authors":"Qiaolin Qin, Ettore Merlo","doi":"10.1016/j.infsof.2025.107906","DOIUrl":"10.1016/j.infsof.2025.107906","url":null,"abstract":"<div><h3>Context:</h3><div>Deep learning models (DNNs) are widely used in high-stakes decision-making domains, but they often inherit and amplify biases present in training data, leading to unfair predictions. Given this context, fairness estimation metrics and bias removal methods are required to select and enhance fair models. However, we found that existing metrics lack robustness in estimating multi-attribute group fairness. Further, existing post-processing bias removal methods often focus on group fairness and fail to address individual fairness or optimize along multiple sensitive attributes.</div></div><div><h3>Objective:</h3><div>In this study, we explore the effectiveness of attribute pruning (i.e., zeroing out sensitive attribute weights in a pre-trained DNN’s input layer) in both bias removal and multi-attribute group fairness estimation.</div></div><div><h3>Methods:</h3><div>To study attribute pruning’s impact on bias removal, we conducted experiments on 32 models and 4 widely used datasets, and compared its effect in single-attribute group bias removal and accuracy preservation with 3 baseline post-processing methods. We then leveraged 3 datasets with multiple sensitive attributes to demonstrate how to use attribute pruning for multi-attribute group fairness estimation.</div></div><div><h3>Results:</h3><div>Single-attribute pruning can better preserve model accuracy than conventional post-processing methods in 23 out of 32 cases, and enforces individual fairness by design. However, since individual fairness and group fairness are fundamentally different objectives, attribute pruning’s effect on group fairness metrics is often inconsistent. We also extend our approach to a multi-attribute setting, demonstrating its potential for improving individual fairness jointly across sensitive attributes and for enabling multi-attribute fairness-aware model selection.</div></div><div><h3>Conclusion:</h3><div>Attribute pruning is a practical post-processing approach for enforcing individual fairness, with limited and data-dependent impact on group fairness. These limitations reflect the inherent trade-off between individual and group fairness objectives. In addition, attribute pruning provides a useful mechanism for bias estimation, particularly in multi-attribute contexts. We advocate for its adoption as a comparison baseline in fairness-aware AI development and encourage further exploration.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107906"},"PeriodicalIF":4.3,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammadsina Almasi, Nazanin Nezami, Francesco Di Carlo, Abolfazl Asudeh, Hadis Anahideh
{"title":"Adaptive Pareto Exploration (APEX) for Fairness-Aware Hyperparameter Optimization in FairPilot","authors":"Mohammadsina Almasi, Nazanin Nezami, Francesco Di Carlo, Abolfazl Asudeh, Hadis Anahideh","doi":"10.1016/j.infsof.2025.107912","DOIUrl":"10.1016/j.infsof.2025.107912","url":null,"abstract":"<div><h3>Context:</h3><div>The integration of machine learning (ML) into high-stakes software systems, such as those used in education, criminal justice, and finance, has elevated concerns over fairness, transparency, and accountability. Traditional hyperparameter optimization approaches often overlook fairness considerations, limiting their suitability for responsible AI development.</div></div><div><h3>Objectives:</h3><div>This work introduces <em>APEX</em>, a fairness-aware MOBO algorithm, and <em>FairPilot</em>, an interactive system for exploring accuracy–fairness trade-offs. We aim to enable practitioners to explore and operationalize fairness-aware ML configurations more effectively.</div></div><div><h3>Methods:</h3><div>APEX, integrates a Pareto-based coordinate selection strategy and a perturbation mechanism that prioritizes hyperparameters based on their joint influence on fairness and accuracy. Built on that, FairPilot visualizes trade-offs between fairness and accuracy across ML models and metrics. We evaluate the system across diverse datasets (e.g., COMPAS, ELS, German Credit) and multiple model types.</div></div><div><h3>Results:</h3><div>APEX consistently outperforms baseline optimization strategies such as ParEGO and EHVI, achieving higher hypervolume coverage and converging more quickly to fairness–accuracy trade-offs of superior quality. An analysis of hyperparameter importance reveals that regularization parameters play a central role in improving fairness while preserving predictive performance.</div></div><div><h3>Conclusions:</h3><div>FairPilot and APEX jointly provide a novel, user-centered, and algorithmically grounded approach to fairness-aware ML. By supporting both visual decision-making and targeted optimization, our system facilitates responsible model development, offering a practical and extensible solution for fairness in software systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107912"},"PeriodicalIF":4.3,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MT4Image: An efficient metamorphic testing approach for image processing applications","authors":"Chang-Ai Sun, Xiaobei Li, Jiayu Xing","doi":"10.1016/j.infsof.2025.107909","DOIUrl":"10.1016/j.infsof.2025.107909","url":null,"abstract":"<div><h3>Context:</h3><div>Metamorphic testing (MT) is widely adopted for testing image processing applications. Although a variety of metamorphic relations (MRs) have been proposed, applying all of them incurs high computational costs. In addition, complex transformation operations are not well supported when generating follow-up test images based on MRs.</div></div><div><h3>Objective:</h3><div>To overcome these limitations, this study proposes an efficient MT approach for image processing applications called <em>MT4Image</em>.</div></div><div><h3>Methods:</h3><div><em>MT4Image</em> employs CycleGAN to generate realistic follow-up images and leverages MRs for various categories of image processing applications. Two optimization strategies, <em>SSampling</em> and <em>EquivalentMR</em>, are further proposed to reduce MRs and test images, respectively. Additionally, a feedback mechanism called <em>ObsAdjuster</em> is designed to adjust the selection of test images and MRs for execution to improve the fault detection efficiency of MT. A prototype tool called <em>MT4I</em> was developed to support the proposed approach.</div></div><div><h3>Results:</h3><div>A series of experiments were conducted on a suite of subject programs covering three categories of image processing applications, with varying image sets from different sources.</div></div><div><h3>Conclusion:</h3><div>The experimental results have shown that <em>MT4Image</em> is capable of effectively testing various categories of image processing applications, while optimization strategies can reduce the amounts of MRs and test images without significantly jeopardizing the fault detection effectiveness and the feedback mechanism can further enhance the fault detection efficiency.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107909"},"PeriodicalIF":4.3,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145322334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}