{"title":"Understanding the Effect of Agile Practice Quality on Software Product Quality","authors":"Sherlock A. Licorish","doi":"10.1109/TSE.2025.3532502","DOIUrl":"10.1109/TSE.2025.3532502","url":null,"abstract":"Agile methods and associated practices have been held to deliver value to software developers and customers. Research studies have reported team productivity and software quality benefits. While such insights are helpful for understanding how agile methods add value during software development, there is need for understanding the intersection of useful practices and outcomes over project duration. This study addresses this opportunity and conducted an observation study of student projects that was complemented by the analysis of demographics data and open responses about the challenges encountered during the use of agile practices. Data from 22 student teams comprising 85 responses were analyzed using quantitative and qualitative approaches, where among our findings we observed that the use of good coding practices and quality management techniques were positively correlated with all dimensions of product quality (e.g., functionality scope and software packaging). Outcomes also reveal that software product quality was predicted by requirements scoping, team planning and communication, and coding practice. However, high levels of team planning and communication were not necessary for all software development activities. When examining project challenges, it was observed that lack of technical skills and poor time management present most challenges to project success. While these challenges may be mitigated by agile practices, such practices may themselves create unease, requiring balance during project implementation.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"650-662"},"PeriodicalIF":6.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143020699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Molloy;Jeremy Banks;Steven H. H. Ding;Furkan Alaca;Philippe Charland;Andrew Walenstein
{"title":"Mecha: A Neural-Symbolic Open-Set Homogeneous Decision Fusion Approach for Zero-Day Malware Similarity Detection","authors":"Christopher Molloy;Jeremy Banks;Steven H. H. Ding;Furkan Alaca;Philippe Charland;Andrew Walenstein","doi":"10.1109/TSE.2025.3531210","DOIUrl":"10.1109/TSE.2025.3531210","url":null,"abstract":"With increasing numbers of novel malware each year, tools are required for efficient and accurate variant matching under the same family, for the purpose of effective proactive threat detection, retro-hunting, and attack campaign tracking. All of the state-of-the-art Deep Learning (DL) approaches assume that the incoming samples originate from known families and incorrectly identify novel families. Additionally, most of the existing solutions that leverage the Siamese Neural Network architecture either rely on pair-wise comparisons or computationally expensive preprocessing steps that are not scalable to a real-world malware triage volume requirement. We propose a different route, Mecha, a Neural-Symbolic Machine Learning (ML) system for malware variant matching and zero-day family detection. Mecha is comprised of an embedding network trained in two different scenarios for byte string embedding and an open-set approximate nearest neighbour algorithm for variant matching and zero-day detection. Our embedding network uses triplet loss for embedding generation and reinforcement-based Expectation Maximization (EM) learning for full deployment optimization. We conduct multiple in-sample and out-of-sample experiments to demonstrate the model's generalizability toward novel variants and families. We also show that Mecha can detect samples outside the known set of malware samples with an accuracy greater than 0.990.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"621-637"},"PeriodicalIF":6.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10847580","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142991509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Completeness and Consistency of Tabular Requirements: An SMT-Based Verification Approach","authors":"Claudio Menghi;Eugene Balai;Darren Valovcin;Christoph Sticksel;Akshay Rajhans","doi":"10.1109/TSE.2025.3530820","DOIUrl":"10.1109/TSE.2025.3530820","url":null,"abstract":"Tabular requirements assist with the specification of software requirements using an “if-then” paradigm and are supported by many tools. For example, the Requirements Table block in Simulink<sup>®</sup> supports writing executable specifications that can be used as test oracles to validate an implementation. But even before the development of an implementation, automatic checking of consistency and completeness of a Requirements Table can reveal errors in the specification. Fixing such errors earlier than in later development cycles avoids costly rework and additional testing efforts that would be required otherwise. As of version R2022a, Simulink<sup>®</sup> supports checking completeness and consistency of Requirements Tables when the requirements are stateless, that is, do not constrain behaviors over time. We overcome this limitation by considering Requirements Tables with both stateless and stateful requirements. This paper (i) formally defines the syntax and semantics of Requirements Tables, and their completeness and consistency, (ii) proposes eight encodings from two categories (namely, bounded and unbounded) that support stateful requirements, and (iii) implements <small>Theano</small>, a solution supporting checking completeness and consistency using these encodings. We empirically assess the effectiveness and efficiency of our encodings in checking completeness and consistency by considering a benchmark of <inline-formula><tex-math>$160$</tex-math></inline-formula> Requirements Tables for a timeout of two hours. Our results show that <small>Theano</small> can check the completeness of all the Requirements Tables in our benchmark, it can detect the inconsistency of the Requirements Tables, but it can not confirm their consistency within the timeout. We also assessed the usefulness of <small>Theano</small> in checking the consistency and completeness of 14 versions of a Requirements Table for a practical example from the automotive domain. Across these 14 versions, <small>Theano</small> could effectively detect two inconsistent and five incomplete Requirements Tables reporting a problem (inconsistency or incompleteness) for <inline-formula><tex-math>$50%$</tex-math></inline-formula> (7 out of 14) versions of the Requirements Table.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"595-620"},"PeriodicalIF":6.5,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844918","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142989089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaoxian Li;Shiyi Qi;Cuiyun Gao;Yun Peng;David Lo;Michael R. Lyu;Zenglin Xu
{"title":"Understanding the Robustness of Transformer-Based Code Intelligence via Code Transformation: Challenges and Opportunities","authors":"Yaoxian Li;Shiyi Qi;Cuiyun Gao;Yun Peng;David Lo;Michael R. Lyu;Zenglin Xu","doi":"10.1109/TSE.2024.3524461","DOIUrl":"10.1109/TSE.2024.3524461","url":null,"abstract":"Transformer-based models have demonstrated state-of-the-art performance in various intelligent coding tasks such as code comment generation and code completion. Previous studies show that deep learning models are sensitive to input variations, but few have systematically studied the robustness of Transformer under perturbed input code. In this work, we empirically study the effect of semantic-preserving code transformations on the performance of Transformers. Specifically, 27 and 24 code transformation strategies are implemented for two popular programming languages, Java and Python, respectively. To facilitating analysis, the strategies are grouped into five categories: block transformation, insertion / deletion transformation, grammatical statement transformation, grammatical token transformation, and identifier transformation. Experiments on three popular code intelligence tasks, including code completion, code summarization, and code search, demonstrate that insertion / deletion transformation and identifier transformation have the greatest impact on the performance of Transformers. Our results also suggest that Transformers based on abstract syntax trees (ASTs) show more robust performance than models based only on code sequences under most code transformations. Besides, the design of positional encoding can impact the robustness of Transformers under code transformations. We also investigate substantial code transformations at the strategy level to expand our study and explore other factors influencing the robustness of Transformers. Furthermore, we explore applications of code transformations. Based on our findings, we distill insights about the challenges and opportunities for Transformer-based code intelligence from various perspectives.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"521-547"},"PeriodicalIF":6.5,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142987459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anomaly Detection on Interleaved Log Data With Semantic Association Mining on Log-Entity Graph","authors":"Guojun Chu;Jingyu Wang;Qi Qi;Haifeng Sun;Zirui Zhuang;Bo He;Yuhan Jing;Lei Zhang;Jianxin Liao","doi":"10.1109/TSE.2025.3527856","DOIUrl":"10.1109/TSE.2025.3527856","url":null,"abstract":"Logs record crucial information about runtime status of software system, which can be utilized for anomaly detection and fault diagnosis. However, techniques struggle to perform effectively when dealing with interleaved logs and entities that influence each other. Although manually specifying a grouping field for each dataset can handle the single grouping scenario, the problems of multiple and heterogeneous grouping still remain unsolved. To break through these limitations, we first design a log semantic association mining approach to convert log sequences into Log-Entity Graph, and then propose a novel log anomaly detection model named Lograph. The semantic association can be utilized to implicitly group the logs and sort out complex dependencies between entities, which have been overlooked in existing literature. Also, a Heterogeneous Graph Attention Network is utilized to effectively capture anomalous patterns of both logs and entities, where Log-Entity Graph serves as a data management and feature engineering module. We evaluate our model on real-world log datasets, comparing with nine baseline models. The experimental results demonstrate that Lograph can improve the accuracy of anomaly detection, especially on the datasets where entity relationships are intricate and grouping strategies are not applicable.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"581-594"},"PeriodicalIF":6.5,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Improving the Performance of Comment Generation Models by Using Bytecode Information","authors":"Yuan Huang;Jinbo Huang;Xiangping Chen;Zibin Zheng","doi":"10.1109/TSE.2024.3523713","DOIUrl":"10.1109/TSE.2024.3523713","url":null,"abstract":"Code comment plays an important role in program understanding, and a large number of automatic comment generation methods have been proposed in recent years. To get a better effect of generating comments, many studies try to extract a variety of information (e.g., code tokens, AST traverse sequence, APIs call sequence) from source code as model input. In this study, we found that the bytecode compiled from the source code can provide useful information for comment generation, hence we propose to use the information from bytecode to assist the comment generation. Specifically, we extract the control flow graph (CFG) from the bytecode and propose a serialization method to obtain the CFG sequence that preserves the program structure. Then, we discuss three methods for introducing bytecode information for different models. We collected 390,000 Java methods from the maven repository, and created a dataset of 101,124 samples after deduplication and preprocessing to evaluate our method. The results show that introducing the information extracted from the bytecode can improve the BLEU-4 of 7 comment generation models.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"503-520"},"PeriodicalIF":6.5,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evidence-based Software Engineering Guidelines Revisited","authors":"Shari Lawrence Pfleeger, Barbara Kitchenham","doi":"10.1109/tse.2025.3526730","DOIUrl":"https://doi.org/10.1109/tse.2025.3526730","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142936793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accuracy Can Lie: On the Impact of Surrogate Model in Configuration Tuning","authors":"Pengzhou Chen;Jingzhi Gong;Tao Chen","doi":"10.1109/TSE.2025.3525955","DOIUrl":"10.1109/TSE.2025.3525955","url":null,"abstract":"To ease the expensive measurements during configuration tuning, it is natural to build a surrogate model as the replacement of the system, and thereby the configuration performance can be cheaply evaluated. Yet, a stereotype therein is that the higher the model accuracy, the better the tuning result would be, or vice versa. This “accuracy is all” belief drives our research community to build more and more accurate models and criticize a tuner for the inaccuracy of the model used. However, this practice raises some previously unaddressed questions, e.g., are the model and its accuracy really that important for the tuning result? Do those somewhat small accuracy improvements reported (e.g., a few % error reduction) in existing work really matter much to the tuners? What role does model accuracy play in the impact of tuning quality? To answer those related questions, in this paper, we conduct one of the largest-scale empirical studies to date—running over the period of 13 months <inline-formula><tex-math>$24times 7$</tex-math></inline-formula>—that covers 10 models, 17 tuners, and 29 systems from the existing works while under four different commonly used metrics, leading to 13,612 cases of investigation. Surprisingly, our key findings reveal that the accuracy can lie: there are a considerable number of cases where higher accuracy actually leads to no improvement in the tuning outcomes (up to 58% cases under certain setting), or even worse, it can degrade the tuning quality (up to 24% cases under certain setting). We also discover that the chosen models in most proposed tuners are sub-optimal and that the required % of accuracy change to significantly improve tuning quality varies according to the range of model accuracy. Deriving from the fitness landscape analysis, we provide in-depth discussions of the rationale behind, offering several lessons learned as well as insights for future opportunities. Most importantly, this work poses a clear message to the community: we should take one step back from the natural “accuracy is all” belief for model-based configuration tuning.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"548-580"},"PeriodicalIF":6.5,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10832565","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142936245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}