IEEE Transactions on Software Engineering最新文献

筛选
英文 中文
A Systematic Study on Real-World Android App Bundles Android应用捆绑系统研究
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-11 DOI: 10.1109/TSE.2025.3560026
Yutian Tang;Xiapu Luo;Yuming Zhou
{"title":"A Systematic Study on Real-World Android App Bundles","authors":"Yutian Tang;Xiapu Luo;Yuming Zhou","doi":"10.1109/TSE.2025.3560026","DOIUrl":"10.1109/TSE.2025.3560026","url":null,"abstract":"Android app developers currently mainly attempt to merge all functions into one app to fit different types of devices. However, this “one-size-fits-all” strategy can introduce various problems to both developers and end-users, such as slower download speed, and a larger attack surface. To resolve this issue, Google promotes the App Bundle framework and requires all new apps must adopt this framework after August 2021. The app bundle framework allows developers to organize their apps in modules. As a new framework, building an app bundle can be time-consuming and error-prone for developers. To fill this gap, in this paper, we discuss how developers build app bundles in practice. By investing in over 200,000 apps from Google Play, we find that 30% of apps have already adopted app bundles. The adoption ratio of large-size apps is even higher than 90%. We also find hands-on programming practices for building feature modules and dynamic assets in app bundles. This study also finds 12 common design practices, which assist developers in building app bundles.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1615-1628"},"PeriodicalIF":6.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143822716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrieval-Augmented Fine-Tuning for Improving Retrieve-and-Edit Based Assertion Generation 用于改进基于检索和编辑的断言生成的检索增强微调
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-07 DOI: 10.1109/TSE.2025.3558403
Hongyan Li;Weifeng Sun;Meng Yan;Ling Xu;Qiang Li;Xiaohong Zhang;Hongyu Zhang
{"title":"Retrieval-Augmented Fine-Tuning for Improving Retrieve-and-Edit Based Assertion Generation","authors":"Hongyan Li;Weifeng Sun;Meng Yan;Ling Xu;Qiang Li;Xiaohong Zhang;Hongyu Zhang","doi":"10.1109/TSE.2025.3558403","DOIUrl":"10.1109/TSE.2025.3558403","url":null,"abstract":"Unit Testing is crucial in software development and maintenance, aiming to verify that the implemented functionality is consistent with the expected functionality. A unit test is composed of two parts: a test prefix, which drives the unit under test to a particular state, and a test assertion, which determines what the expected behavior is under that state. To reduce the effort of conducting unit tests manually, Yu et al. proposed an integrated approach (<i>integration</i> for short), combining information retrieval with a deep learning-based approach to generate assertions for test prefixes, and obtained promising results. In our previous work, we found that the overall performance of <i>integration</i> is mainly due to its success in retrieving assertions. Moreover, <i>integration</i> is limited to specific types of edit operations and struggles to understand the semantic differences between the retrieved focal-test (<i>focal-test</i> includes a test prefix and a unit under test) and the input focal-test. Based on these insights, we then proposed a retrieve-and-edit approach named <small>EditAS</small> to learn the assertion edit patterns to improve the effectiveness of assertion generation in our prior study. Despite being promising, we find that the effectiveness of <small>EditAS</small> can be further improved. Our analysis shows that: ① The editing ability of <small>EditAS</small> still has ample room for improvement. Its performance degrades as the edit distance between the retrieval assertion and ground truth increases. Specifically, the average accuracy of <small>EditAS</small> is <inline-formula><tex-math>$12.38%$</tex-math></inline-formula> when the edit distance is greater than 5. ② <small>EditAS</small> lacks a fine-grained semantic understanding of both the retrieved focal-test and the input focal-test themselves, which leads to many inaccurate token modifications. In particular, an average of 25.57% of the incorrectly generated assertions that need to be modified are not modified, and an average of 6.45% of the assertions that match the ground truth are still modified. Thanks to pre-trained models employing pre-training paradigms on large-scale data, they tend to have good semantic comprehension and code generation abilities. In light of this, we propose <inline-formula><tex-math>$EditAS^{2}$</tex-math></inline-formula>, which improves retrieval-and-edit based assertion generation through retrieval-augmented fine-tuning. Specifically, <inline-formula><tex-math>$EditAS^{2}$</tex-math></inline-formula> first retrieves a similar focal-test from a predefined corpus and treats its assertion as a prototype. Then, <inline-formula><tex-math>$EditAS^{2}$</tex-math></inline-formula> uses a pre-trained model, CodeT5, to learn the semantics of the input and similar focal-tests as well as assertion editing patterns to automatically edit the prototype. We first evaluate the <inline-formula><tex-math>$EditAS^{2}$</tex-math></inline-formula> for i","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1591-1614"},"PeriodicalIF":6.5,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143797837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Fusion for Android Malware Detection Based on Large Pre-Trained Models 基于大型预训练模型的Android恶意软件检测多模态融合
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-03 DOI: 10.1109/TSE.2025.3557577
Xun Li;Lei Liu;Yuzhou Liu;Yu Zhao;Peng Zhang;Huaxiao Liu
{"title":"Multimodal Fusion for Android Malware Detection Based on Large Pre-Trained Models","authors":"Xun Li;Lei Liu;Yuzhou Liu;Yu Zhao;Peng Zhang;Huaxiao Liu","doi":"10.1109/TSE.2025.3557577","DOIUrl":"10.1109/TSE.2025.3557577","url":null,"abstract":"Malware detection is a critical issue in software engineering as it directly threatens user information security. Existing approaches often focus on individual modality (either source code or binary code) for the detection, but it ignores to effectively exploit the complementary information between them. This limits the detection performance, especially in complex and evasive malware scenarios. In this paper, we take Android applications written in Java as objects, and provide a novel fine-grained multimodal fusion method with large pre-trained models to combine the features from source and binary codes for the malware detection. For the source code modality, we employ the graphical user interface (GUI) as a framework to segment the source code into snippets, and use a pre-trained programming language model to extract feature representations. For the binary code modality, we convert binary code into grayscale images and fine-tune a pre-trained vision model to extract features indirectly. We then implement cross-modal attention and devise a contrastive loss to align features across modalities, supplementing this with supervised classification loss to refine the multimodal fusion process specifically for malware detection. Our experiments, conducted using the Data-MD and Data-MC benchmarks, demonstrate that our approach achieves a precision of 0.977 and a recall of 0.984 in detecting malware. This underscores the advantages of using large pre-trained models for feature representation and the fusion of information across different modalities for effective malware detection.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1569-1590"},"PeriodicalIF":6.5,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NumScout: Unveiling Numerical Defects in Smart Contracts Using LLM-Pruning Symbolic Execution NumScout:使用llm -剪枝符号执行揭示智能合约中的数字缺陷
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-28 DOI: 10.1109/TSE.2025.3555622
Jiachi Chen;Zhenzhe Shao;Shuo Yang;Yiming Shen;Yanlin Wang;Ting Chen;Zhenyu Shan;Zibin Zheng
{"title":"NumScout: Unveiling Numerical Defects in Smart Contracts Using LLM-Pruning Symbolic Execution","authors":"Jiachi Chen;Zhenzhe Shao;Shuo Yang;Yiming Shen;Yanlin Wang;Ting Chen;Zhenyu Shan;Zibin Zheng","doi":"10.1109/TSE.2025.3555622","DOIUrl":"10.1109/TSE.2025.3555622","url":null,"abstract":"In recent years, the Ethereum platform has witnessed a proliferation of smart contracts, accompanied by exponential growth in total value locked (TVL). High-TVL smart contracts often require complex numerical computations, particularly in mathematical financial models used by many decentralized applications (DApps). Improper calculations can introduce numerical defects, posing potential security risks. Existing research primarily focuses on traditional numerical defects like integer overflow, and there is currently a lack of systematic research and effective detection methods targeting new types of numerical defects. In this paper, we identify five new types of numerical defects through the analysis of 1,199 audit reports by utilizing the open card method. Each defect is defined and illustrated with a code example to highlight its features and potential consequences. We also propose NumScout, a symbolic execution-based tool designed to detect these five defects. Specifically, the tool combines information from source code and bytecode, analyzing key operations such as comparisons and transfers, to effectively locate defects and report them based on predefined detection patterns. Furthermore, NumScout uses a large language model (LLM) to prune functions which are unrelated to numerical operations. This step allows symbolic execution to quickly enter the target function and improve runtime speed by 28.4%. We run NumScout on 6,617 real-world contracts and evaluated its performance based on manually labeled results. We find that 1,774 contracts contained at least one of the five defects, and the tool achieved an overall precision of 89.7%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1538-1553"},"PeriodicalIF":6.5,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SQLaw: Detecting Bugs in GPU Database Management Systems via Rule-Based Differential Execution SQLaw:通过基于规则的差异执行来检测GPU数据库管理系统中的bug
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-28 DOI: 10.1109/TSE.2025.3574328
Jiaxin Hu;Rongxin Wu
{"title":"SQLaw: Detecting Bugs in GPU Database Management Systems via Rule-Based Differential Execution","authors":"Jiaxin Hu;Rongxin Wu","doi":"10.1109/TSE.2025.3574328","DOIUrl":"10.1109/TSE.2025.3574328","url":null,"abstract":"Database Management Systems (DBMSs) are essential for managing structured data. To meet the increasing performance requirements for complex, large-scale data management and analysis, GPU DBMSs have been introduced to enhance processing and query execution speeds. Despite the growing interest in GPU DBMSs and the inherent presence of bugs, there has been no systematic effort, to our knowledge, to detect bugs in GPU DBMSs. To this end, we design SQLaw, an innovative and comprehensive framework that combines offline rule learning with an online interpreter incorporating mutation for efficient and general GPU-related bug detection. The offline rule learning component automatically extracts differential execution rules, which are used to guide the synthesis of configuration and query statements for testing. The online interpreter with mutation ensures the generalization of these statements. We evaluated SQLaw on three major GPU DBMSs. Our extensive evaluations demonstrate that SQLaw outperforms current state-of-the-art approaches by up to 2.22<inline-formula><tex-math>$times$</tex-math></inline-formula> in the number of bugs detected within 24 hours. Additionally, SQLaw detected 51 previously unknown GPU-related bugs, of which 37 have been confirmed or fixed by developers.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2144-2160"},"PeriodicalIF":6.5,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144165103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prioritizing Test Gaps by Risk in Industrial Practice: An Automated Approach and Multimethod Study 工业实践中根据风险对测试间隙进行优先排序:自动化方法和多方法研究
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-28 DOI: 10.1109/TSE.2025.3556248
Roman Haas;Michael Sailer;Mitchell Joblin;Elmar Juergens;Sven Apel
{"title":"Prioritizing Test Gaps by Risk in Industrial Practice: An Automated Approach and Multimethod Study","authors":"Roman Haas;Michael Sailer;Mitchell Joblin;Elmar Juergens;Sven Apel","doi":"10.1109/TSE.2025.3556248","DOIUrl":"10.1109/TSE.2025.3556248","url":null,"abstract":"<italic>Context.</i> Untested code changes, called <italic>test gaps</i>, pose a significant risk for software projects. Since test gaps increase the probability of defects, managing test gaps and their individual risk is important, especially for rapidly changing software systems. <italic>Objective.</i> This study aims at gaining an understanding of test gaps in industrial practice establishing criteria for precise prioritization of test gaps by their risk, informing practitioners that need to manage, review, and act on larger sets of test gaps. <italic>Method.</i> We propose an automated approach for prioritizing test gaps based on key risk criteria. By means of an analysis of 31 historical test gap reviews from 8 industrial software systems of our industrial partners Munich Re and LV 1871, and by conducting semi-structured interviews with the 6 quality engineers that authored the historical test gap reviews, we validate the transferability of the identified risk criteria, such as code criticality and complexity metrics. <italic>Results.</i> Our automated approach exhibits a ranking performance equivalent to expert assessments, in that test gaps labelled as risky in historical test gap reviews are prioritized correctly, on average, on the 30th percentile. In some scenarios, our automated ranking system even outpaces expert assessments, especially for test gaps in central code—for non-developers an opaque code property. <italic>Conclusion.</i> This research underscores the industrial need of test gap risk estimation techniques to assist test management and quality assurance teams in identifying and addressing critical test gaps. Our multimethod study shows that even a lightweight prioritization approach helps practitioners to identify high-risk test gaps efficiently and to filter out low-risk test gaps.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1554-1568"},"PeriodicalIF":6.5,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SmartFL: Semantics Based Probabilistic Fault Localization 基于语义的概率故障定位
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-27 DOI: 10.1109/TSE.2025.3574487
Yiqian Wu;Yujie Liu;Yi Yin;Muhan Zeng;Zhentao Ye;Xin Zhang;Yingfei Xiong;Lu Zhang
{"title":"SmartFL: Semantics Based Probabilistic Fault Localization","authors":"Yiqian Wu;Yujie Liu;Yi Yin;Muhan Zeng;Zhentao Ye;Xin Zhang;Yingfei Xiong;Lu Zhang","doi":"10.1109/TSE.2025.3574487","DOIUrl":"10.1109/TSE.2025.3574487","url":null,"abstract":"Testing-based fault localization has been a research focus in software engineering in the past decades. It localizes faulty program elements based on a set of passing and failing test executions. Since whether a fault could be triggered and detected by a test is related to program semantics, it is crucial to model program semantics in fault localization approaches. Existing approaches either consider the full semantics of the program (e.g., mutation-based fault localization and angelic debugging), leading to scalability issues, or ignore the semantics of the program (e.g., spectrum-based fault localization), leading to imprecise localization results. Our key idea is: by modeling only the correctness of program values but not their full semantics, a balance could be reached between effectiveness and scalability. To realize this idea, we introduce a probabilistic model by efficient approximation of program semantics and several techniques to address scalability challenges. Our approach, (<bold>S</b>e<bold>M</b>antics b<bold>A</b>sed p<bold>R</b>obabilis<bold>T</b>ic <bold>F</b>ault <bold>L</b>ocalization), is evaluated on a real-world dataset, Defects4J 2.0. The top-1 statement-level accuracy of our approach is 14%, which improves 130% over the best SBFL and MBFL methods. The average time cost is 205 seconds per fault, which is half of SBFL methods. After combining our approach with existing approaches using the CombineFL framework, the performance of the combined approach is significantly boosted by an average of 10% on top-1, top-3, and top-5 accuracy compared to state-of-the-art combination methods.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2161-2180"},"PeriodicalIF":6.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144153692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BabelRTS: Polyglot Regression Test Selection BabelRTS:多语言回归测试选择
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-27 DOI: 10.1109/TSE.2025.3554403
Gabriele Maurina;Walter Cazzola;Sudipto Ghosh
{"title":"BabelRTS: Polyglot Regression Test Selection","authors":"Gabriele Maurina;Walter Cazzola;Sudipto Ghosh","doi":"10.1109/TSE.2025.3554403","DOIUrl":"10.1109/TSE.2025.3554403","url":null,"abstract":"Regression test selection (RTS) approaches reduce the number of regression tests. Current RTS approaches are typically monoglot, i.e., their implementations target a specific language. However, many subjects under test (SUT) are polyglot, i.e., they use multiple languages. Running multiple monoglot RTS approaches separately on a polyglot SUT is unsafe because tests that involve inter-language dependencies can be missed. Moreover, a new language may require completely reimplementing an RTS approach, especially if the original implementation relies on language and runtime features that are not available in the new language. We propose a new static approach called BabelRTS, which is multilingual (supports multiple languages out of the box), polyglot (analyzes SUTs written in multiple languages), and extensible (allows adding support for new languages). A key contribution is the idea of encapsulating the language-specific aspects of RTS by using patterns and actions. A pattern specifies programming language constructs used in each file that indicate dependencies to other files written in the same or a different language. An action specifies how to identify these files in the codebase. Patterns and actions can be customized to support new languages without modifying the test selection algorithm. BabelRTS is not tied to a specific language run-time system or paradigm. BabelRTS currently supports 12 languages and 5 language combinations. We evaluated BabelRTS on 142 open-source monoglot and polyglot SUTs, analyzing a total of more than two billion LOC. The performance of BabelRTS was similar to the state-of-the-art monoglot approaches on monoglot SUTs. On polyglot SUTs, BabelRTS was safer in polyglot mode and selected more tests for 60% of the commits than in monoglot mode, which missed inter-language dependencies.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1487-1499"},"PeriodicalIF":6.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Programmer Visual Attention During Context-Aware Code Summarization 程序员在上下文感知代码总结过程中的视觉注意力
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-26 DOI: 10.1109/TSE.2025.3554990
Robert Wallace;Aakash Bansal;Zachary Karas;Ningzhi Tang;Yu Huang;Toby Jia-Jun Li;Collin McMillan
{"title":"Programmer Visual Attention During Context-Aware Code Summarization","authors":"Robert Wallace;Aakash Bansal;Zachary Karas;Ningzhi Tang;Yu Huang;Toby Jia-Jun Li;Collin McMillan","doi":"10.1109/TSE.2025.3554990","DOIUrl":"10.1109/TSE.2025.3554990","url":null,"abstract":"Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. The focus of current research in modeling this programmer attention has been on using mouse cursors, keystrokes, or eye tracking equipment to map areas in a snippet of code. These approaches have traditionally only mapped attention for a single method. However, there is a knowledge gap in the literature because programming tasks such as source code summarization require programmers to use contextual knowledge that can only be found in other parts of the project, not only in a single method. To address this knowledge gap, we conducted an in-depth human study with 10 Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read up to 35% fewer words (p <inline-formula><tex-math>$boldsymbol{ lt }$</tex-math></inline-formula> 0.01) over the whole session, and revisit 13% fewer words (p <inline-formula><tex-math>$ lt $</tex-math></inline-formula> 0.03) as they summarize each method during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p <inline-formula><tex-math>$boldsymbol{ lt }$</tex-math></inline-formula> 0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1524-1537"},"PeriodicalIF":6.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-Based Automation of COSMIC Functional Size Measurement From Use Cases 基于llm的基于用例的COSMIC功能大小测量自动化
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-26 DOI: 10.1109/TSE.2025.3554562
Gabriele De Vito;Sergio Di Martino;Filomena Ferrucci;Carmine Gravino;Fabio Palomba
{"title":"LLM-Based Automation of COSMIC Functional Size Measurement From Use Cases","authors":"Gabriele De Vito;Sergio Di Martino;Filomena Ferrucci;Carmine Gravino;Fabio Palomba","doi":"10.1109/TSE.2025.3554562","DOIUrl":"10.1109/TSE.2025.3554562","url":null,"abstract":"COmmon Software Measurement International Consortium (COSMIC) Functional Size Measurement is a method widely used in the software industry to quantify user functionality and measure software size, which is crucial for estimating development effort, cost, and resource allocation. COSMIC measurement is a manual task that requires qualified professionals and effort. To support professionals in COSMIC measurement, we propose an automatic approach, CosMet, that leverages Large Language Models to measure software size starting from use cases specified in natural language. To evaluate the proposed approach, we developed a web tool that implements CosMet using GPT-4 and conducted two studies to assess the approach quantitatively and qualitatively. Initially, we experimented with CosMet on seven software systems, encompassing 123 use cases, and compared the generated results with the ground truth created by two certified professionals. Then, seven professional measurers evaluated the analysis achieved by CosMet and the extent to which the approach reduces the measurement time. The first study's results revealed that CosMet is highly effective in analyzing and measuring use cases. The second study highlighted that CosMet offers a transparent and interpretable analysis, allowing practitioners to understand how the measurement is derived and make necessary adjustments. Additionally, it reduces the manual measurement time by 60-80%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1500-1523"},"PeriodicalIF":6.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信