IET SoftwarePub Date : 2024-09-16DOI: 10.1049/2024/8027037
Xuanye Wang, Lu Lu, Qingyan Tian, Haishan Lin
{"title":"IC-GraF: An Improved Clustering with Graph-Embedding-Based Features for Software Defect Prediction","authors":"Xuanye Wang, Lu Lu, Qingyan Tian, Haishan Lin","doi":"10.1049/2024/8027037","DOIUrl":"https://doi.org/10.1049/2024/8027037","url":null,"abstract":"<div>\u0000 <p>Software defect prediction (SDP) has been a prominent area of research in software engineering. Previous SDP methods often struggled in industrial applications, primarily due to the need for sufficient historical data. Thus, clustering-based unsupervised defect prediction (CUDP) and cross-project defect prediction (CPDP) emerged to address this challenge. However, the former exhibited limitations in capturing semantic and structural features, while the latter encountered constraints due to differences in data distribution across projects. Therefore, we introduce a novel framework called improved clustering with graph-embedding-based features (IC-GraF) for SDP without the reliance on historical data. First, a preprocessing operation is performed to extract program dependence graphs (PDGs) and mark distinct dependency relationships within them. Second, the improved deep graph infomax (IDGI) model, an extension of the DGI model specifically for SDP, is designed to generate graph-level representations of PDGs. Finally, a heuristic-based k-means clustering algorithm is employed to classify the features generated by IDGI. To validate the efficacy of IC-GraF, we conduct experiments based on 24 releases of the PROMISE dataset, using F-measure and G-measure as evaluation criteria. The findings indicate that IC-GraF achieves 5.0%−42.7% higher F-measure, 5%−39.4% higher G-measure, and 2.5%−11.4% higher AUC over existing CUDP methods. Even when compared with eight supervised learning-based SDP methods, IC-GraF maintains a superior competitive edge.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8027037","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142244994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IET SoftwarePub Date : 2024-09-03DOI: 10.1049/2024/5358773
Nana Zhang, Kun Zhu, Dandan Zhu
{"title":"IAPCP: An Effective Cross-Project Defect Prediction Model via Intra-Domain Alignment and Programming-Based Distribution Adaptation","authors":"Nana Zhang, Kun Zhu, Dandan Zhu","doi":"10.1049/2024/5358773","DOIUrl":"https://doi.org/10.1049/2024/5358773","url":null,"abstract":"<div>\u0000 <p>Cross-project defect prediction (CPDP) aims to identify defect-prone software instances in one project (target) using historical data collected from other software projects (source), which can help maintainers allocate limited testing resources reasonably. Unfortunately, the feature distribution discrepancy between the source and target projects makes it challenging to transfer the matching feature representation and severely hinders CPDP performance. Besides, existing CPDP models require an intensively expensive and time-consuming process to tune a lot of parameters. To address the above limitations, we propose an effective CPDP model named IAPCP based on distribution adaptation in this study, which consists of two stages: correlation alignment and intra-domain programming. Correlation alignment first calculates the covariance matrices of the source and target projects and then erases some features of the source project (i.e., whitening operation) and employs the features of the target project (i.e., target covariance) to fill the source project, thereby well aligning the source and target feature distributions and reducing the distribution discrepancy across projects. Intra-domain programming can directly learn a nonparametric linear transfer defect predictor with strong discriminative capacity by solving a probabilistic annotation matrix (PAM) based on the adjusted features of the source project. The model does not require model selection and parameter tuning. Extensive experiments on a total of 82 cross-project pairs from 16 software projects demonstrate that IAPCP can achieve competitive CPDP effectiveness and efficiency compared with multiple state-of-the-art baseline models.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/5358773","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IET SoftwarePub Date : 2024-08-30DOI: 10.1049/2024/8846233
Jiayun Zhang, Qingyuan Gong, Yang Chen, Yu Xiao, Xin Wang, Aaron Yi Ding
{"title":"Understanding Work Rhythms in Software Development and Their Effects on Technical Performance","authors":"Jiayun Zhang, Qingyuan Gong, Yang Chen, Yu Xiao, Xin Wang, Aaron Yi Ding","doi":"10.1049/2024/8846233","DOIUrl":"https://doi.org/10.1049/2024/8846233","url":null,"abstract":"<div>\u0000 <p>The temporal patterns of code submissions, denoted as work rhythms, provide valuable insight into the work habits and productivity in software development. In this paper, we investigate the work rhythms in software development and their effects on technical performance by analyzing the profiles of developers and projects from 110 international organizations and their commit activities on GitHub. Using clustering, we identify four work rhythms among individual developers and three work rhythms among software projects. Strong correlations are found between work rhythms and work regions, seniority, and collaboration roles. We then define practical measures for technical performance and examine the effects of different work rhythms on them. Our findings suggest that moderate overtime is related to good technical performance, whereas fixed office hours are associated with receiving less attention. Furthermore, we survey 92 developers to understand their experience with working overtime and the reasons behind it. The survey reveals that developers often work longer than required. A positive attitude towards extended working hours is associated with situations that require addressing unexpected issues or when clear incentives are provided. In addition to the insights from our quantitative and qualitative studies, this work sheds light on tangible measures for both software companies and individual developers to improve the recruitment process, project planning, and productivity assessment.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8846233","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142100088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IET SoftwarePub Date : 2024-08-13DOI: 10.1049/2024/7060298
Ma Mingze
{"title":"Research and Application of Firewall Log and Intrusion Detection Log Data Visualization System","authors":"Ma Mingze","doi":"10.1049/2024/7060298","DOIUrl":"https://doi.org/10.1049/2024/7060298","url":null,"abstract":"<div>\u0000 <p>This paper tackles current challenges in network security analysis by proposing an innovative information gain-based feature selection algorithm and leveraging visualization techniques to develop a network security log data visualization system. The system’s key functions include raw data collection for firewall logs and intrusion detection logs, data preprocessing, database management, data manipulation, data logic processing, and data visualization. Through statistical analysis of log data and the construction of visualization models, the system presents analysis results in diverse graphical formats while offering interactive capabilities. Seamlessly integrating data generation, processing, analysis, and display processes, the system demonstrates high accuracy, precision, recall, F1 score, and real-time performance metrics, reaching 98.3%, 92.1%, 97.5%, 98.1%, and 91.2%, respectively, in experimental evaluations. The proposed method significantly enhances real-time prediction capabilities of network security status and monitoring efficiency of network devices, providing a robust security assurance tool.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/7060298","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmented Frequency-Domain Correlation Prediction Model for Long-Term Time Series Forecasting Using Transformer","authors":"Haozhuo Tong, Lingyun Kong, Jie Liu, Shiyan Gao, Yilu Xu, Yuezhe Chen","doi":"10.1049/2024/2920167","DOIUrl":"https://doi.org/10.1049/2024/2920167","url":null,"abstract":"<div>\u0000 <p>Long-term time series forecasting has received significant attention from researchers in recent years. Transformer model-based approaches have emerged as promising solutions in this domain. Nevertheless, most existing methods rely on point-by-point self-attention mechanisms or employ transformations, decompositions, and reconstructions of the entire sequence to capture dependencies. The point-by-point self-attention mechanism becomes impractical for long-term time series forecasting due to its quadratic complexity with respect to the time series length. Decomposition and reconstruction methods may introduce information loss, leading to performance bottlenecks in the models. In this paper, we propose a Transformer-based forecasting model called NPformer. Our method introduces a novel multiscale segmented Fourier attention mechanism. By segmenting the long-term time series and performing discrete Fourier transforms on different segments, we aim to identify frequency-domain correlations between these segments. This allows us to capture dependencies more effectively. In addition, we incorporate a normalization module and a desmoothing factor into the model. These components address the problem of oversmoothing that arises in sequence decomposition methods. Furthermore, we introduce an isometry convolution method to enhance the prediction accuracy of the model. The experimental results demonstrate that NPformer outperforms other Transformer-based methods in long-term time series forecasting.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/2920167","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IET SoftwarePub Date : 2024-06-18DOI: 10.1049/2024/8425877
Ruina Guo, Shu Wang, Guangsen Wei
{"title":"Accounting Management and Optimizing Production Based on Distributed Semantic Recognition","authors":"Ruina Guo, Shu Wang, Guangsen Wei","doi":"10.1049/2024/8425877","DOIUrl":"https://doi.org/10.1049/2024/8425877","url":null,"abstract":"<div>\u0000 <p>Accounting management and production optimization are vital aspects of enterprise management, serving as indispensable core components in the modern business landscape. However, conventional methods reliant on manual input exhibit drawbacks such as low recognition accuracy and excessive memory consumption. To address these challenges, semantic recognition technology utilizing voice signals has emerged as a pivotal solution across various industries. Building upon this premise, this paper introduces a distributed semantic recognition-based algorithm for accounting management and production optimization. The proposed algorithm encompasses multiple modules, including a front-end feature extraction module, a channel transmission module, and a voice quality vector quantization module. Additionally, a semantic recognition module is introduced to process the voice signals and generate prediction results. By leveraging extensive accounting management and production data for learning and analysis, the algorithm automatically uncovers patterns and laws within the data, extracting valuable information. To validate the proposed algorithm, this study utilizes the dataset from the UCI machine learning repository and applies it for analysis and processing. The experimental findings demonstrate that the algorithm introduced in this paper outperforms alternative methods. Specifically, it achieves a notable 9.3% improvement in comprehensive recognition accuracy and reduces memory usage by 34.4%. These results highlight the algorithm’s efficacy in enhancing the understanding and analysis of customer needs, market trends, competitors, and other pertinent information within the realm of commercial applications for companies.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/8425877","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141424921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Chandy–Lamport Distributed Snapshot Algorithm Using Colored Petri Net","authors":"Saeid Pashazadeh, Basheer Zuhair Jaafar Al-Basseer, Jafar Tanha","doi":"10.1049/2024/6582682","DOIUrl":"https://doi.org/10.1049/2024/6582682","url":null,"abstract":"<div>\u0000 <p>Distributed global snapshot (DGS) is one of the fundamental protocols in distributed systems. It is used for different applications like collecting information from a distributed system and taking checkpoints for process rollback. The Chandy–Lamport protocol (CLP) is famous and well-known for taking DGS. The main aim of this protocol was to generate consistent cuts without interrupting the regular operation of the distributed system. CLP was the origin of many future protocols and inspired them. The first aim of this paper is to propose a novel formal hierarchical parametric colored Petri net model of CLP. The number of constituting processes of the model is parametric. The second aim is to automatically generate a novel message sequence chart (MSC) to show detailed steps for each simulation run of the snapshot protocol. The third aim is model checking of the proposed formal model to verify the correctness of CLP and our proposed colored Petri net model. Having vital tools helps greatly to test the correct operation of the newly proposed distributed snapshot protocol. The proposed model of CLP can easily be used for visually testing the correct operation of the new future under-development DGS protocol. It also permits formal verification of the correct operation of the new proposed protocol. This model can be used as a simple, powerful, and visual tool for the step-by-step run of the CLP, model checking, and teaching it to postgraduate students. The same approach applies to similar complicated distributed protocols.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/6582682","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141286897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IET SoftwarePub Date : 2024-05-30DOI: 10.1049/2024/3946655
Qinhe Zhang, Jiachen Zhang, Tie Feng, Jialang Xue, Xinxin Zhu, Ningyang Zhu, Zhiheng Li
{"title":"Software Defect Prediction Using Deep Q-Learning Network-Based Feature Extraction","authors":"Qinhe Zhang, Jiachen Zhang, Tie Feng, Jialang Xue, Xinxin Zhu, Ningyang Zhu, Zhiheng Li","doi":"10.1049/2024/3946655","DOIUrl":"https://doi.org/10.1049/2024/3946655","url":null,"abstract":"<div>\u0000 <p>Machine learning-based software defect prediction (SDP) approaches have been commonly proposed to help to deliver high-quality software. Unfortunately, all the previous research conducted without effective feature reduction suffers from high-dimensional data, leading to unsatisfactory prediction performance measures. Moreover, without proper feature reduction, the interpretability and generalization ability of machine learning models in SDP may be compromised, hindering their practical utility in diverse software development environments. In this paper, an SDP approach using deep <i>Q</i>-learning network (DQN)-based feature extraction is proposed to eliminate irrelevant, redundant, and noisy features and improve the classification performance. In the data preprocessing phase, the undersampling method of BalanceCascade is applied to divide the original datasets. As the first step of feature extraction, the weight ranking of all the metric elements is calculated according to the expected cross-entropy. Then, the relation matrix is constructed by applying random matrix theory. After that, the reward principle is defined for computing the <i>Q</i> value of <i>Q</i>-learning based on weight ranking, relation matrix, and the number of errors, according to which a convolutional neural network model is trained on datasets until the sequences of metric pairs are generated for all datasets acting as the revised feature set. Various experiments have been conducted on 11 NASA and 11 PROMISE repository datasets. Sensitive analysis experiments show that binary classification algorithms based on SDP approaches using the DQN-based feature extraction outperform those without using it. We also conducted experiments to compare our approach with four state-of-the-art approaches on common datasets, which show that our approach is superior to these methods in precision, <i>F</i>-measure, area under receiver operating characteristics curve, and Matthews correlation coefficient values.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/3946655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141246131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IET SoftwarePub Date : 2024-05-16DOI: 10.1049/2024/1561351
Siyu Jiang, Jiapeng Zhang, Feng Guo, Teng Ouyang, Jing Li
{"title":"Balanced Adversarial Tight Matching for Cross-Project Defect Prediction","authors":"Siyu Jiang, Jiapeng Zhang, Feng Guo, Teng Ouyang, Jing Li","doi":"10.1049/2024/1561351","DOIUrl":"10.1049/2024/1561351","url":null,"abstract":"<div>\u0000 <p>Cross-project defect prediction (CPDP) is an attractive research area in software testing. It identifies defects in projects with limited labeled data (target projects) by utilizing predictive models from data-rich projects (source projects). Existing CPDP methods based on transfer learning mainly rely on the assumption of a unimodal distribution and consider the case where the feature distribution has one obvious peak. However, in actual situations, the feature distribution of project samples often exhibits multiple peaks that cannot be ignored. It manifests as a multimodal distribution, making it challenging to align distributions between different projects. To address this issue, we propose a balanced adversarial tight-matching model for CPDP. Specifically, this method employs multilinear conditioning to obtain the cross-covariance of both features and classifier predictions, capturing the multimodal distribution of the feature. When reducing the captured multimodal distribution differences, pseudo-labels are needed, but pseudo-labels have uncertainty. Therefore, we additionally add an auxiliary classifier and attempt to generate pseudo-labels using a pseudo-label strategy with less uncertainty. Finally, the feature generator and two classifiers undergo adversarial training to align the multimodal distributions of different projects. This method outperforms the state-of-the-art CPDP model used on the benchmark dataset.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/1561351","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140968219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IET SoftwarePub Date : 2024-04-30DOI: 10.1049/2024/4488412
Qing Qi, Jian Cao
{"title":"An Empirical Study on Downstream Dependency Package Groups in Software Packaging Ecosystems","authors":"Qing Qi, Jian Cao","doi":"10.1049/2024/4488412","DOIUrl":"https://doi.org/10.1049/2024/4488412","url":null,"abstract":"<div>\u0000 <p>The role of focal packages in packaging ecosystems is crucial for the development of the entire ecosystem, as they are the packages on which other packages depend. However, the evolution of dependency groups in packaging ecosystems has not been systematically investigated. In this study, we examine the downstream dependency package groups (DDGs) in three typical packaging ecosystems—Cargo for Rust, Comprehensive Perl Archive Network for Perl, and RubyGems for Ruby—to identify their features and evolution. We also identify and analyze a special type of DDG, the collaborative downstream dependency package group (CDDG), which requires shared contributors. Our findings show that the overall development of DDGs, particularly CDDGs, is consistent with the status of the whole ecosystem, and the size of DDGs and CDDGs follows a power law distribution. Furthermore, the interaction mechanisms between focal packages and downstream packages differ between ecosystems, but focal packages always play a leading role in the development of DDGs and CDDGs. Finally, we investigate predictive models for the development of CDDGs in the next stage based on their features, and our results show that random forest and Gradient Boosting Regression Tree achieve acceptable prediction accuracy. We provide the raw data and scripts used for our analysis at https://github.com/onion616/DDG.</p>\u0000 </div>","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"2024 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/4488412","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}