{"title":"OpCodeBERT: A Method for Python Code Representation Learning by BERT with Opcode","authors":"Canyu Qiu, Jianxun Liu, Xiaocong Xiao, Yong Xiao","doi":"10.1109/tse.2025.3610244","DOIUrl":"https://doi.org/10.1109/tse.2025.3610244","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"79 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shangqing Liu, Daya Guo, Jian Zhang, Wei Ma, Yanzhou Li, Yang Liu
{"title":"An Empirical Study of Exploring the Capabilities of Large Language Models in Code Learning","authors":"Shangqing Liu, Daya Guo, Jian Zhang, Wei Ma, Yanzhou Li, Yang Liu","doi":"10.1109/tse.2025.3609876","DOIUrl":"https://doi.org/10.1109/tse.2025.3609876","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"55 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145072850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manel Abdellatif;Naouel Moha;Yann-Gaël Guéhéneuc;Hafedh Mili;Ghizlane El Boussaidi
{"title":"Identifying Reusable Services in Legacy Object-Oriented Systems: A Type-Sensitive Identification Approach","authors":"Manel Abdellatif;Naouel Moha;Yann-Gaël Guéhéneuc;Hafedh Mili;Ghizlane El Boussaidi","doi":"10.1109/TSE.2025.3603009","DOIUrl":"10.1109/TSE.2025.3603009","url":null,"abstract":"The migration of legacy software systems to a <italic>service-oriented architecture</i> (SOA) is one of the main strategies for modernising such systems. The success of modernising a legacy system to a SOA highly depends on the used service identification approach where the goal is to identify reusable functionalities that could become services. In this paper, we perform a comparative analysis of service identification approaches proposed by academia and industry. We show that there is a gap between academia and industry in the used approaches to identify services from legacy systems. We extract from the comparative analysis several recommendations about the inputs, processes, and outputs that a service identification approach should have. Based on these recommendations, we propose <italic>ServiceMiner</i>, a bottom-up service identification approach, which relies on source-code analysis, because other sources of information may be unavailable or out of sync with the actual code. <italic>ServiceMiner</i> relies on a categorisation of service types and code-level patterns characterising types of services. We evaluate <italic>ServiceMiner</i> on four case studies. We also compare our results to those of three state-of-the-art approaches. We show that <italic>ServiceMiner</i> identifies architecturally-significant services with, on average, 78% precision, 76% recall, and 77% F-measure.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 10","pages":"2879-2899"},"PeriodicalIF":5.6,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145035294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Empirical Study of Software Refactorings in Real-World Open-Source Java Projects","authors":"Bridget Nyirongo, Yanjie Jiang, Nan Niu, Hui Liu","doi":"10.1109/tse.2025.3604821","DOIUrl":"https://doi.org/10.1109/tse.2025.3604821","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"30 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145003110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Uraz Cengiz Türker, Robert M. Hierons, Mohammad Reza Mousavi, Khaled El-Fakih
{"title":"Efficient state identification for finite state machine-based testing","authors":"Uraz Cengiz Türker, Robert M. Hierons, Mohammad Reza Mousavi, Khaled El-Fakih","doi":"10.1109/tse.2025.3604472","DOIUrl":"https://doi.org/10.1109/tse.2025.3604472","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"62 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145003109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Post-Incorporating Code Structural Knowledge into Pretrained Models via ICL for Code Translation","authors":"Yali Du, Hui Sun, Ming Li","doi":"10.1109/tse.2025.3605768","DOIUrl":"https://doi.org/10.1109/tse.2025.3605768","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"59 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144987632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiheng Mao;Zhenhao Li;Xing Hu;Kui Liu;Xin Xia;Jianling Sun
{"title":"Towards Explainable Vulnerability Detection With Large Language Models","authors":"Qiheng Mao;Zhenhao Li;Xing Hu;Kui Liu;Xin Xia;Jianling Sun","doi":"10.1109/TSE.2025.3605442","DOIUrl":"10.1109/TSE.2025.3605442","url":null,"abstract":"Software vulnerabilities pose significant risks to the security and integrity of software systems. Although prior studies have explored vulnerability detection using deep learning and pre-trained models, these approaches often fail to provide the detailed explanations necessary for developers to understand and remediate vulnerabilities effectively. The advent of large language models (LLMs) has introduced transformative potential due to their advanced generative capabilities and ability to comprehend complex contexts, offering new possibilities for addressing these challenges. In this paper, we propose <bold>LLMVulExp</b>, an automated framework designed to specialize LLMs for the dual tasks of vulnerability detection and explanation. To address the challenges of acquiring high-quality annotated data and injecting domain-specific knowledge, <bold>LLMVulExp</b> leverages prompt-based techniques for annotating vulnerability explanations and fine-tunes LLMs using instruction tuning with Low-Rank Adaptation (LoRA), enabling <bold>LLMVulExp</b> to detect vulnerability types in code while generating detailed explanations, including the cause, location, and repair suggestions. Additionally, we employ a Chain-of-Thought (CoT) based key code extraction strategy to focus LLMs on analyzing vulnerability-prone code, further enhancing detection accuracy and explanatory depth. We conducted experiments across multiple vulnerability detection settings on three benchmark datasets, demonstrating the effectiveness of our method. This study highlights the feasibility of utilizing LLMs for real-world vulnerability detection and explanation tasks, providing critical insights into their adaptation and application in software security.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 10","pages":"2957-2971"},"PeriodicalIF":5.6,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Jiang;Zhichen Qu;Christoph Treude;Xiaohong Su;Tiantian Wang
{"title":"Enhancing Fine-Grained Vulnerability Detection With Reinforcement Learning","authors":"Yuan Jiang;Zhichen Qu;Christoph Treude;Xiaohong Su;Tiantian Wang","doi":"10.1109/TSE.2025.3603400","DOIUrl":"10.1109/TSE.2025.3603400","url":null,"abstract":"The rapid growth of vulnerabilities has significantly accelerated the development of automated vulnerability detection methods, especially those based on data-driven models. However, most of them primarily focus on extracting accurate code representations while overlooking the complex vulnerability patterns among vulnerable statements, thereby leaving room for improvement. To overcome this limitation, we present a novel reinforcement learning framework (<italic>RLFD</i>) for detecting vulnerabilities at a fine-grained level. <italic>RLFD</i> redefines the detection task as a sequential decision-making process and then employs reinforcement learning to automatically learn vulnerability-relevant structures from code snippets. Moreover, by designing reward functions aligned with fine-grained evaluation metrics, <italic>RLFD</i> focuses on the co-existence relations among statements from a global perspective, enabling the model to capture complex interactions that lead to vulnerabilities. Additionally, the framework utilizes CodeBERT-HLS for code representation, ensuring consistency with the state-of-the-art method while highlighting the improvements brought by the proposed reinforcement learning-based approach. Comprehensive experiments show that our method achieves a locating precision (IoU) of 69.7% and a Top-5% Acc of 67.7% on the <italic>big_vul</i> dataset, outperforming the state-of-the-art method by an overall 3.4% improvement in IoU. Notably, our method achieves up to a 19.7% increase in IoU for specific categories, e.g., CWE-416 (use-after-free).","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 10","pages":"2900-2920"},"PeriodicalIF":5.6,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144918989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiangang Li;Shi Ying;Xiangbo Tian;Ting Zhang;Yong Wang
{"title":"ASTRA: Adversarial Sim-to-Real Transfer Reinforcement Learning for Autoscaling in Cloud Systems","authors":"Tiangang Li;Shi Ying;Xiangbo Tian;Ting Zhang;Yong Wang","doi":"10.1109/TSE.2025.3603995","DOIUrl":"10.1109/TSE.2025.3603995","url":null,"abstract":"With the widespread adoption of cloud computing, autoscaling has become crucial for efficient resource management and stable service provision in cloud systems. In recent years, autoscaling methods based on deep reinforcement learning (DRL) have gained significant attention due to their outstanding adaptability and flexibility. However, training DRL-based autoscaler requires interactions with real cloud systems, incurring high interaction costs, low data collection efficiency, and potential operational impacts. To address these challenges, we propose ASTRA, a sim-to-real transfer reinforcement learning framework for autoscaling. ASTRA constructs a cloud system simulation environment based on a performance estimation model, enabling low-cost and high-efficiency training sample collection for policy learning. The learned policy is subsequently transferred to the real systems for scaling decisions. To address performance modeling inaccuracies caused by dynamic cloud state changes, we propose a performance modeling method based on hybrid attentive state space model. By incorporating state space model, it captures system dynamics and state evolution, effectively reducing simulation errors. Furthermore, to mitigate the performance degradation of the transferred policy due to the distribution shift, we propose an autoscaling method based on adversarial soft actor-critic. By introducing adversarial policy training with gradient regularization based on state perturbations, it significantly improves transferred policy performance. The results in the real system demonstrate that ASTRA achieves optimal overall performance in environment modeling, policy transfer and real-world autoscaling. Specifically, ASTRA outperforms all baselines in terms of instance number, response time, SLO violation rate, and CPU utilization under different workload patterns. More importantly, under limited interaction costs, ASTRA achieves a 616.94<inline-formula><tex-math>$times$</tex-math></inline-formula> improvement in interaction sample collection rate compared to direct online training method.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 10","pages":"2921-2941"},"PeriodicalIF":5.6,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144918990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}