{"title":"Data analytics in software startups: Understanding key concepts and critical challenges","authors":"Usman Rafiq, Xiaofeng Wang, Eduardo Guerra","doi":"10.1016/j.infsof.2024.107652","DOIUrl":"10.1016/j.infsof.2024.107652","url":null,"abstract":"<div><h3>Context:</h3><div>The continuous proliferation of data nowadays has inspired companies to make data-informed decisions. Despite the acknowledged benefits of analytics, there is a persistent question about how companies, especially software startup companies with distinguishing characteristics, can effectively create value from it. In the startup context, analytics refers to the use of startup data and insights to inform strategies and tactics across startup business, product, team, sales, and marketing dimensions.</div></div><div><h3>Objective:</h3><div>In this study, we aim to bridge the knowledge gap by eliciting an understanding of the analytics that software startup companies hold and identifying critical challenges they face in the realm of analytics.</div></div><div><h3>Method:</h3><div>We conducted a multiple-case study with eight software startups at different startup stages. In addition to the data collected through semi-structured interviews, we considered other data sources such as analytics dashboards and online data about the startups, including websites and social media platforms. We analyzed the data using thematic analysis.</div></div><div><h3>Results:</h3><div>Our results firstly revealed a divergent understanding of analytics by software startups, based on which we reported essential characteristics of analytics perceived by them. Then we identified 22 analytics challenges classified into six main themes. The themes encompass data capture and access challenges, data interpretation and bias, communication challenges, cultural challenges, external influences and constraints, and analytics implementation challenges.</div></div><div><h3>Conclusions:</h3><div>Our findings contribute to a conceptual understanding of analytics in software startups and the identification of critical challenges faced by these startups across different stages. The conceptual understanding lays the foundation for comprehending what constitutes analytics for software startups, while the identification of challenges anticipates critical barriers to the adoption and implementation of analytics. We also provide practical implications to both researchers and practitioners.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"180 ","pages":"Article 107652"},"PeriodicalIF":3.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143305312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction for Special Issue on Software Production","authors":"","doi":"10.1016/j.infsof.2024.107658","DOIUrl":"10.1016/j.infsof.2024.107658","url":null,"abstract":"","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107658"},"PeriodicalIF":3.8,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What are the emotions of developers towards deep learning documentation? — An exploratory study on Stack Overflow posts","authors":"Akhila Sri Manasa Venigalla, Sridhar Chimalakonda","doi":"10.1016/j.infsof.2024.107655","DOIUrl":"10.1016/j.infsof.2024.107655","url":null,"abstract":"<div><h3>Context:</h3><div>Non native machine learning and deep learning (DL) developers face several challenges in using DL frameworks owing to the issues persistent in DL documentation. However, there are no studies that explore the reasons for issues in documentation.</div></div><div><h3>Objective:</h3><div>Investigating the underlying emotions in developer discussions on documentation could help in identifying reasons for issues in documentation. Hence, in this study, we analyse emotions of Stack Overflow posts corresponding to documentation of DL frameworks.</div></div><div><h3>Methodology:</h3><div>We identify relevant deep-learning related tags using integrated snowballing approach and extract 159.2K posts related to DL. We then identify documentation related posts among these using keyword matching approach, which resulted in 13,572 DL documentation related posts. We use Random Forest Classifier to build six emotion classifier models based on Gold Label Dataset for emotions. We then classify the extracted posts into each of the six emotions — <em>Anger</em>, <em>Fear</em>, <em>Love</em>, <em>Joy</em>, <em>Sadness</em> and <em>Surprise</em> using the classifier models, and curate the results.</div></div><div><h3>Results:</h3><div>We observe a large expression of anger and sadness, with more than half of posts having ‘yolo’ and ‘activation-function’ tags exhibiting these emotions, while <em>Love</em> emotion is predominantly present in posts with ‘theano’ tag. During our analysis, we observed that 40% of ‘Body’ and ‘Answer’ posts exhibited anger and sadness emotions.</div></div><div><h3>Conclusion:</h3><div>Our study reveals the large presence of Anger, Fear and Sadness emphasizing the need to improve DL framework documentation. Specifically, maintainers of the ‘yolo’ and ‘matcaffe’ libraries could improve their documentation, as the corresponding posts exhibit more of <em>Anger</em> and <em>Sadness</em>.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107655"},"PeriodicalIF":3.8,"publicationDate":"2024-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143091907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zengyang Li , Xiaoyong Zhang , Wenshuo Wang , Peng Liang , Ran Mo , Jie Tan , Hui Liu
{"title":"Automated detection of inter-language design smells in multi-language deep learning frameworks","authors":"Zengyang Li , Xiaoyong Zhang , Wenshuo Wang , Peng Liang , Ran Mo , Jie Tan , Hui Liu","doi":"10.1016/j.infsof.2024.107656","DOIUrl":"10.1016/j.infsof.2024.107656","url":null,"abstract":"<div><h3>Context:</h3><div>Nowadays, most deep learning frameworks (DLFs) use multilingual programming of Python and C/C++, facilitating the flexibility and performance of the DLF. However, inappropriate inter-language interaction may introduce design smells involving multiple programming languages (PLs), i.e., Inter-Language Design Smells (ILDS). Despite the negative impact of ILDS on multi-language DLFs, there is a lack of an automated approach for detecting ILDS in multi-language DLFs and a comprehensive understanding on ILDS in such DLFs.</div></div><div><h3>Objective:</h3><div>This work aims to automatically detect ILDS in multi-language DLFs written in the combination of Python and C/C++, and to obtain a comprehensive understanding on such ILDS in DLFs.</div></div><div><h3>Methods:</h3><div>We first developed an approach to automatically detecting ILDS in the multi-language DLFs written in the combination of Python and C/C++, including a number of ILDS and their detection rules defined based on inter-language communication mechanisms and code analysis. Then, we developed the <span>CPsmell</span> tool that implements detection rules for automatically detecting such ILDS, and manually validated the accuracy of the tool. Finally, we performed an empirical study to evaluate the ILDS in multi-language DLFs.</div></div><div><h3>Results:</h3><div>We proposed seven ILDS and achieved an accuracy of 98.17% in the manual validation of <span>CPsmell</span> in 5 popular multi-language DLFs. The study results revealed that among the 5 DLFs, TensorFlow, PyTorch, and PaddlePaddle exhibit relatively high prevalence of ILDS; each smelly file contains around 5 ILDS instances on average, with ILDS <em>Long Lambda Function For Inter-language Binding</em> and <em>Unused Native Entity</em> being relatively prominent; throughout the evolution process of the 5 DLFs, some ILDS were resolved to a certain extent, but the overall count of ILDS instances shows an upward trend.</div></div><div><h3>Conclusions:</h3><div>The automated detection of the proposed ILDS achieved a high accuracy, and the empirical study provides a comprehensive understanding on ILDS in the multi-language DLFs.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107656"},"PeriodicalIF":3.8,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Tang , Yang Zhou , Cheng Yang , Ye Du , Ming-song Yang
{"title":"Instance gravity oversampling method for software defect prediction","authors":"Yu Tang , Yang Zhou , Cheng Yang , Ye Du , Ming-song Yang","doi":"10.1016/j.infsof.2024.107657","DOIUrl":"10.1016/j.infsof.2024.107657","url":null,"abstract":"<div><h3>Context</h3><div>In the software defect datasets, the number of defective instances is significantly lower than that of non-defective instances. This imbalance adversely impacts the predictive performance of the model. Oversampling methods can effectively balance datasets. However, traditional oversampling methods often struggle to capture the underlying relationships between features and are prone to introducing noise during instance synthesis.</div></div><div><h3>Objective</h3><div>Inspired by the law of gravity, we propose a novel oversampling method based on instance gravity (MOSIG).</div></div><div><h3>Method</h3><div>This method begins by introducing a new metric, instance gravity, to measure the similarity between instances. Subsequently, feature models are constructed, and instance groups are generated. Instances that meet specific conditions based on instance gravity are then identified within different instance groups. Finally, we propose a novel method for synthesizing defective instances by assigning weights to instances according to their gravity.</div></div><div><h3>Results</h3><div>Experimental results demonstrate that MOSIG significantly enhances the predictive performance of both the CART decision tree and Naive Bayes models across 21 publicly available software defect datasets. The experimental results are further validated using the Friedman ranking and Nemenyi post-hoc test, confirming that MOSIG is statistically significant.</div></div><div><h3>Conclusion</h3><div>MOSIG represents a more promising oversampling method.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107657"},"PeriodicalIF":3.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143091981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy attack method for adaptive multi-exit neural networks","authors":"Dongfang Du, Chaofeng Sha, Xin Peng","doi":"10.1016/j.infsof.2024.107653","DOIUrl":"10.1016/j.infsof.2024.107653","url":null,"abstract":"<div><h3>Context:</h3><div>Adaptive Multi-Exit Neural Networks (AMENNs) have emerged as a promising solution for energy-efficient and faster inference in resource-constrained environments. To ensure that these networks meet performance requirements, evaluating their energy robustness is essential. Recent works have focused on energy attacks against models in both white-box and black-box scenarios. However, existing approaches in a black-box scenarios require a significant amount of additional training data to train auxiliary models, resulting in prohibitively high costs for the attacks.</div></div><div><h3>Objectives:</h3><div>In this work, we leverage genetic algorithm (GA) to search for high-energy samples to conduct attacks and evaluate the energy robustness of the AMENN models directly in black-box scenario, named <strong>E</strong>nergy <strong>A</strong>ttack using <strong>G</strong>enetic <strong>A</strong>lgorithm (EAGA).</div></div><div><h3>Methods:</h3><div>In the context of black-box scenarios, we propose an energy attack method based on genetic algorithm for AMENNs used in image classification tasks. By enhancing the fitness function to target high-energy consumption samples and improving population initialization and crossover mutation operations, we ensure a diverse and rich sample space for robust evaluation.</div></div><div><h3>Results:</h3><div>The results show that EAGA outperforms current baseline methods, demonstrating an average improvement of over 17% in the mean percentage increase in energy consumption of AMENNs. Furthermore, we guarantee the high quality of the generated attack inputs by ensuring sufficient similarity between the original image and the attack image.</div></div><div><h3>Conclusion:</h3><div>EAGA introduces a novel and efficient method for assessing the energy robustness of AMENNs in a black-box setting, devoid of the need for local gradient information. Through the utilization of genetic algorithms, this approach allows for a direct evaluation of model performance in resource-constrained environments. The study emphasizes the importance of EAGA in enhancing the evaluation process of AMENN models and underscores its potential to advance energy-efficient neural network deployments.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107653"},"PeriodicalIF":3.8,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Yan , Junjie Chen , Jie M. Zhang , Xuejie Cao , Chen Yang , Mark Harman
{"title":"Robustness evaluation of code generation systems via concretizing instructions","authors":"Ming Yan , Junjie Chen , Jie M. Zhang , Xuejie Cao , Chen Yang , Mark Harman","doi":"10.1016/j.infsof.2024.107645","DOIUrl":"10.1016/j.infsof.2024.107645","url":null,"abstract":"<div><h3>Context:</h3><div>Code generation systems have been extensively developed in recent years to generate source code based on natural language instructions. However, despite their advancements, these systems still face robustness issues where even slightly different instructions can result in significantly different code semantics. Robustness is critical for code generation systems, as it can have significant impacts on software development, software quality, and trust in the generated code. Although existing testing techniques for general text-to-text software can detect some robustness issues, they can produce many false positives and are limited in effectiveness due to ignoring the characteristics of this kind of systems.</div></div><div><h3>Objective:</h3><div>To better evaluate (and further enhance) the robustness of code generation systems, in this work, we conducted the first exploration by carefully considering the characteristics of code generation systems. Specifically, we propose such a novel technique (called COCO) and perform an extensive study to evaluate the robustness of code generation systems with COCO.</div></div><div><h3>Method:</h3><div>COCO exploits the usage scenario of code generation systems to make the original programming instruction more concrete by incorporating features known to be present in the original code. A robust system should maintain code semantics for the concretized instruction, and COCO detects robustness inconsistencies when it does not. In the extensive study, we evaluated the robustness of eight advanced code generation systems (including commercial tools Copilot and ChatGPT) with COCO, using two widely-used datasets.</div></div><div><h3>Results:</h3><div>Our results demonstrate the effectiveness of COCO. It does not produce any false positive, ensuring the accuracy of robustness evaluation. Additionally, it outperforms the two baselines adopted from general text-to-text software testing, detecting 440.31% and 95.81% more inconsistencies, respectively. Concretized instructions generated by COCO can further help reduce robustness inconsistencies by 21.90% to 60.18% via fine-tuning.</div></div><div><h3>Conclusions:</h3><div>COCO is effective in detecting robust inconsistencies in code generation systems and significantly outperforms baselines. Additionally, fine-tuning code generation systems with the concretized instructions provided by COCO can largely enhance their robustness.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107645"},"PeriodicalIF":3.8,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongsheng Qian, Qingyuan Yu, Hui Zhu, Jinping Liu, Tingfeng Fu
{"title":"Reinforcement learning for test case prioritization based on LLEed K-means clustering and dynamic priority factor","authors":"Zhongsheng Qian, Qingyuan Yu, Hui Zhu, Jinping Liu, Tingfeng Fu","doi":"10.1016/j.infsof.2024.107654","DOIUrl":"10.1016/j.infsof.2024.107654","url":null,"abstract":"<div><div>Integrating reinforcement learning (RL) into test case prioritization (TCP) aims to cope with the dynamic nature and time constraints of continuous integration (CI) testing. However, achieving optimal ranking across CI cycles is challenging if the RL agent starts from an unfavorable initial environment and deals with a dynamic environment characterized by continuous errors during learning. To mitigate the influence of adverse environments, this work proposes an approach to <strong>T</strong>est <strong>C</strong>ase <strong>P</strong>rioritization which incorporates Locally Linear Embedding-based <strong>K</strong>-means Clustering and <strong>D</strong>ynamic Priority Factor into <strong>R</strong>einforcement <strong>L</strong>earning (<strong>TCP-KDRL</strong>). Firstly, we exploit the K-means clustering method with Locally Linear Embedding (LLE) to mine the relationships between test cases, followed by assigning initial priority factors to the test cases. These test cases are ranked based on their initial factors, providing an improved initial learning environment for the agent in RL. Secondly, with the agent learning the ranking strategy in various cycles, we design a comprehensive reward indicator by considering running discrepancy and the position between test cases. Additionally, based on the reward values, the dynamic priority factors for the ranked test cases in each learning round of RL are adaptively updated and the sequence is locally fine-tuned. The fine-tuning strategy provides ample feedback to the agent and enables real-time correction of the erroneous ranking environment, enhancing the generalization of RL across various cycles. Finally, the experimental results demonstrate that TCP-KDRL, as an enhanced RL-based TCP method, outperforms other competitive TCP approaches. Specifically, incorporating the reward indicator and the fine-tuning strategy components, the results are significantly better than that of combining any other two components. For instance, in 12 projects, the average improvements are 0.1548 in APFD and 0.0793 in NRPA. Compared to other TCP methods, the proposed method achieves notable enhancement, with an increase of 0.6902 in APFD and 0.3816 in NRPA.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107654"},"PeriodicalIF":3.8,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143091906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Li , Yingpei Zeng , Xiangpu Song , Shanqing Guo
{"title":"Improving seed quality with historical fuzzing results","authors":"Yang Li , Yingpei Zeng , Xiangpu Song , Shanqing Guo","doi":"10.1016/j.infsof.2024.107651","DOIUrl":"10.1016/j.infsof.2024.107651","url":null,"abstract":"<div><h3>Context:</h3><div>Coverage-guided fuzzing (CGF) has achieved great success in discovering software vulnerabilities. The efficiency of CGF highly relies on the quality of the initial seed corpus. Although there have been some works in recent years investigating the initial seed selection, usually only the corpus given by developers or downloaded from the Internet is used to get the initial seed corpus.</div></div><div><h3>Objective:</h3><div>We assess several existing corpus minimization tools and find that none of them effectively leverage information contained in historical fuzzing results. The historical fuzzing results may come from previous fuzz testing or the emerging continuous fuzzing integration in the software development cycle. Therefore, we want to utilize history fuzzing results to generate a high-quality initial corpus to enhance the fuzzing performance. Besides, the size of the initial corpus will affect the fuzzing efficiency, so using a minimization tool to extract valuable seeds from historical results is essential.</div></div><div><h3>Method:</h3><div>We propose to use historical fuzzing results to help construct the initial seed corpus and further develop a corpus minimization tool named MCM (multiple corpora minimization), which can analyze multiple fuzzing results and use information including edge appearance frequency to help seed selection.</div></div><div><h3>Results:</h3><div>We implement a prototype of MCM and evaluate it on 10 open-source programs. Our experiments show that by using historical fuzzing results to expand the size of the initial seed corpus even a small number, e.g., from 20 to only 100, the branch coverage improves up to 14%. Meanwhile, MCM can achieve higher code coverage than existing corpus minimization tools, including AFL-CMIN and OPTIMIN.</div></div><div><h3>Conclusion:</h3><div>Our study shows using historical results to generate a high-quality initial corpus is practical and can effectively improve the fuzzing performance.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107651"},"PeriodicalIF":3.8,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DHG-BiGRU: Dual-attention based hierarchical gated BiGRU for software defect prediction","authors":"Ruchika Malhotra, Priya Singh","doi":"10.1016/j.infsof.2024.107646","DOIUrl":"10.1016/j.infsof.2024.107646","url":null,"abstract":"<div><h3>Context:</h3><div>Software defect prediction (SDP) is a prominent research area focussed on anticipating defects early in the software lifecycle. Traditional machine learning models are based on static features, which are not enough to capture contextual information in the source code. In recent years, researchers have also developed deep learning models that extract semantic information from source code using the abstract syntax tree (AST). These approaches often combine static and semantic features by a simple merger operation.</div></div><div><h3>Objective:</h3><div>The article aims to address the limitations of the existing models by utilizing advanced feature extraction and integration techniques. It develops a deep learning model that can effectively prioritize the crucial features and intelligently combine the static and semantic features to provide robust predictions</div></div><div><h3>Method:</h3><div>The article proposes a novel model namely, dual-attention-based hierarchical gated BiGRU (DHG-BiGRU). The model first employs a static feature extractor (StatFE) and a semantic feature extractor (SemFE) to capture static and semantic features, respectively. Next, the outputs from StatFE and SemFE are passed to individual BiGRUs. The BiGRU output associated with the semantic features is subsequently processed by a dual attention mechanism (DAM), that captures the complex semantic information with emphasis on the most crucial features. Afterward, the hierarchical gated fusion (HGF) meticulously merges the static and semantic features. Finally, these integrated features are passed through a sigmoid function to predict defects.</div></div><div><h3>Results:</h3><div>The extensive experiments on extensively utilized datasets from the PROMISE repository reveal that DHG-BiGRU performs significantly better than the most advanced models and consistently achieves higher precision, recall and f-measure, demonstrating a reliable prediction capability.</div></div><div><h3>Conclusion:</h3><div>The results of the study underscore the potential advanced feature extraction and integration techniques for SDP. By achieving considerable improvements over state-of-the-art techniques, the proposed approach paves the way for sophisticated defect prediction models to improve software quality and reliability.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"179 ","pages":"Article 107646"},"PeriodicalIF":3.8,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143092445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}