IEEE Transactions on Software Engineering最新文献

筛选
英文 中文
Pearl: A Multi-Derivation Approach to Efficient CFL-Reachability Solving PEARL:高效 CFL 可及性求解的多重衍生方法
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-08-05 DOI: 10.1109/TSE.2024.3437684
Chenghang Shi;Haofeng Li;Yulei Sui;Jie Lu;Lian Li;Jingling Xue
{"title":"Pearl: A Multi-Derivation Approach to Efficient CFL-Reachability Solving","authors":"Chenghang Shi;Haofeng Li;Yulei Sui;Jie Lu;Lian Li;Jingling Xue","doi":"10.1109/TSE.2024.3437684","DOIUrl":"10.1109/TSE.2024.3437684","url":null,"abstract":"Context-free language (CFL) reachability is a fundamental framework for formulating program analyses. CFL-reachability analysis works on top of an edge-labeled graph by deriving reachability relations and adding them as labeled edges to the graph. Existing CFL-reachability algorithms typically adopt a single-reachability relation derivation (SRD) strategy, i.e., one reachability relation is derived at a time. Unfortunately, this strategy can lead to redundancy, hindering the efficiency of the analysis. To address this problem, this paper proposes \u0000<small>Pearl</small>\u0000, a \u0000<i>multi-derivation</i>\u0000 approach that reduces derivation redundancy for CFL-reachability solving, which significantly improves the efficiency of CFL-reachability analysis. Our key insight is that multiple edges can be simultaneously derived via batch propagation of reachability relations. We also tailor our multi-derivation approach to tackle transitive relations that frequently arise when solving CFL-reachability. Specifically, we present a highly efficient transitive-aware variant, \u0000<small>Pearl<sup>PG</sup></small>\u0000, which enhances \u0000<small>Pearl</small>\u0000 with \u0000<i>propagation graphs</i>\u0000, a lightweight but effective graph representation, to further diminish redundant derivations. We evaluate the performance of our approach on two clients, i.e., context-sensitive value-flow analysis and field-sensitive alias analysis for C/C++. By eliminating a large amount of redundancy, our approach outperforms two baselines including the standard CFL-reachability algorithm and a state-of-the-art solver \u0000<small>Pocr</small>\u0000 specialized for fast transitivity solving. In particular, the empirical results demonstrate that, for value-flow analysis and alias analysis respectively, \u0000<small>Pearl<sup>PG</sup></small>\u0000 runs 3.09\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000 faster on average (up to 4.44\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000) and 2.25\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000 faster on average (up to 3.31\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000) than \u0000<small>Pocr</small>\u0000, while also consuming less memory.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2379-2397"},"PeriodicalIF":6.5,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes 地址观察者基于消毒器的内存泄漏修复本地化
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-08-05 DOI: 10.1109/TSE.2024.3438119
Aniruddhan Murali;Mahmoud Alfadel;Meiyappan Nagappan;Meng Xu;Chengnian Sun
{"title":"AddressWatcher: Sanitizer-Based Localization of Memory Leak Fixes","authors":"Aniruddhan Murali;Mahmoud Alfadel;Meiyappan Nagappan;Meng Xu;Chengnian Sun","doi":"10.1109/TSE.2024.3438119","DOIUrl":"10.1109/TSE.2024.3438119","url":null,"abstract":"Memory leak bugs are a major problem in C/C++ programs. They occur when memory objects are not deallocated. Developers need to manually deallocate these objects to prevent memory leaks. As such, several techniques have been proposed to automatically fix memory leaks. Although proposed approaches have merit in automatically fixing memory leaks, they present limitations. Static-based approaches attempt to trace the complete semantics of memory object across all paths. However, they have scalability-related challenges when the target program has a large number of paths (path explosion). On the other hand, dynamic approaches can spell out precise semantics of memory object only on a single execution path (it does not consider multiple execution paths). In this paper, we complement prior approaches by designing and implementing a novel framework named \u0000<italic>AddressWatcher</i>\u0000. AddressWatcher allows the semantics of a memory object to be tracked on multiple execution paths. Addresswatcher accomplishes this by using a leak database that allows one to store and compare different execution paths of a leak over several test cases. Also, AddressWatcher performs lightweight instrumentation during compile time that is utilized during the program execution to watch and track memory leak read/writes. We conduct an evaluation of AddressWatcher over five popular packages, namely binutils, openssh, tmux, openssl and git. In 23 out of 50 real-world memory leak bugs, AddressWatcher correctly points to a free location to fix memory leaks. Finally, we submit 25 Pull Requests across 12 popular OSS repositories using AddressWatcher suggestions. Among these, 21 were merged leading to 5 open issues being addressed. In fact, our critical fix prompted a new version release for the calc repository, a program used to find large primes. Furthermore, our contributions through these PRs sparked intense discussions and appreciation in various repositories such as coturn, h2o, and radare2.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2398-2411"},"PeriodicalIF":6.5,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering 阅读软件工程技术文章时的年龄和性别偏见控制实验
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-08-05 DOI: 10.1109/TSE.2024.3437355
Anda Liang;Emerson Murphy-Hill;Westley Weimer;Yu Huang
{"title":"A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering","authors":"Anda Liang;Emerson Murphy-Hill;Westley Weimer;Yu Huang","doi":"10.1109/TSE.2024.3437355","DOIUrl":"10.1109/TSE.2024.3437355","url":null,"abstract":"Online platforms and communities are a critical part of modern software engineering, yet are often affected by human biases. While previous studies investigated human biases and their potential harms against the efficiency and fairness of online communities, they have mainly focused on the open source and \u0000<italic>Q &amp; A</i>\u0000 platforms, such as \u0000<italic>GitHub</i>\u0000 and \u0000<italic>Stack Overflow</i>\u0000, but overlooked the audience-focused online platforms for delivering programming and SE-related technical articles, where millions of software engineering practitioners share, seek for, and learn from high-quality software engineering articles (i.e., \u0000<italic>technical articles</i>\u0000 for SE). Furthermore, most of the previous work has revealed gender and race bias, but we have little knowledge about the effect of age on software engineering practice. In this paper, we propose to investigate the effect of authors’ demographic information (gender and age) on the evaluation of technical articles on software engineering and potential behavioral differences among participants. We conducted a survey-based and controlled human study and collected responses from 540 participants to investigate developers’ evaluation of technical articles for software engineering. By controlling the gender and age of the author profiles of technical articles for SE, we found that raters tend to have more positive content depth evaluations for younger male authors when compared to older male authors and that male participants conduct technical article evaluations faster than female participants, consistent with prior study findings. Surprisingly, different from other software engineering evaluation activities (e.g., code review, pull request, etc.), we did not find a significant difference in the genders of authors on the evaluation outcome of technical articles in SE.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2498-2511"},"PeriodicalIF":6.5,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10623245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long Live the Image: On Enabling Resilient Production Database Containers for Microservice Applications 图像万岁关于为微服务应用程序启用弹性生产数据库容器
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-08-01 DOI: 10.1109/TSE.2024.3436623
Zheng Li;Nicolás Saldías-Vallejos;Diego Seco;María Andrea Rodríguez;Rajiv Ranjan
{"title":"Long Live the Image: On Enabling Resilient Production Database Containers for Microservice Applications","authors":"Zheng Li;Nicolás Saldías-Vallejos;Diego Seco;María Andrea Rodríguez;Rajiv Ranjan","doi":"10.1109/TSE.2024.3436623","DOIUrl":"10.1109/TSE.2024.3436623","url":null,"abstract":"Microservices architecture advocates decentralized data ownership for building software systems. Particularly, in the Database per Service pattern, each microservice is supposed to maintain its own database and to handle the data related to its functionality. When implementing microservices in practice, however, there seems to be a paradox: The de facto technology (i.e., containerization) for microservice implementation is claimed to be unsuitable for the microservice component (i.e., database) in production environments, mainly due to the data persistence issues (e.g., dangling volumes) and security concerns. As a result, the existing discussions generally suggest replacing database containers with cloud database services, while leaving the on-premises microservice implementation out of consideration. After identifying three statelessness-dominant application scenarios, we proposed container-native data persistence as a conditional solution to enable resilient database containers in production. In essence, this data persistence solution distinguishes stateless data access (i.e., reading) from stateful data processing (i.e., creating, updating, and deleting), and thus it aims at the development of stateless microservices for suitable applications. In addition to developing our proposal, this research is particularly focused on its validation, via prototyping the solution and evaluating its performance, and via applying this solution to two real-world microservice applications. From the industrial perspective, the validation results have proved the feasibility, usability, and efficiency of fully containerized microservices for production in applicable situations. From the academic perspective, this research has shed light on the operation-side micro-optimization of individual microservices, which fundamentally expands the scope of “software micro-optimization” and reveals new research opportunities.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2363-2378"},"PeriodicalIF":6.5,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141877537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating the Uncertainty and Imprecision of Log-Based Code Coverage Without Requiring Additional Logging Statements 减轻基于日志的代码覆盖的不确定性和不精确性,而无需额外的日志声明
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-07-29 DOI: 10.1109/TSE.2024.3435067
Xiaoyan Xu;Filipe R. Cogo;Shane McIntosh
{"title":"Mitigating the Uncertainty and Imprecision of Log-Based Code Coverage Without Requiring Additional Logging Statements","authors":"Xiaoyan Xu;Filipe R. Cogo;Shane McIntosh","doi":"10.1109/TSE.2024.3435067","DOIUrl":"10.1109/TSE.2024.3435067","url":null,"abstract":"Understanding code coverage is an important precursor to software maintenance activities (e.g., better testing). Although modern code coverage tools provide key insights, they typically rely on code instrumentation, resulting in significant performance overhead. An alternative approach to code instrumentation is to process an application's source code and the associated log traces in tandem. This so-called “log-based code coverage” approach does not impose the same performance overhead as code instrumentation. Chen et al. proposed \u0000<sc>LogCoCo</small>\u0000 — a tool that implements log-based code coverage for \u0000<sc>Java</small>\u0000. While \u0000<sc>LogCoCo</small>\u0000 breaks important new ground, it has fundamental limitations, namely: uncertainty due to the lack of logging statements in conditional branches, and imprecision caused by dependency injection. In this study, we propose \u0000<sc>Log2Cov</small>\u0000, a tool that generates log-based code coverage for programs written in \u0000<sc>Python</small>\u0000 and addresses uncertainty and imprecision issues. We evaluate \u0000<sc>Log2Cov</small>\u0000 on three large and active open-source systems. More specifically, we compare the performance of \u0000<sc>Log2Cov</small>\u0000 to that of \u0000<sc>Coverage.py</small>\u0000, an instrumentation-based coverage tool for \u0000<sc>Python</small>\u0000. Our results indicate that 1) \u0000<sc>Log2Cov</small>\u0000 achieves high precision without introducing runtime overhead; and 2) uncertainty and imprecision can be reduced by up to 11% by statically analyzing the program's source code and execution logs, without requiring additional logging instrumentation from developers. While our enhancements make substantial improvements, we find that future work is needed to handle conditional statements and exception handling blocks to achieve parity with instrumentation-based approaches. We conclude the paper by drawing attention to these promising directions for future work.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2350-2362"},"PeriodicalIF":6.5,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Evaluation Metrics for Neural Test Oracle Generation 评估神经测试 Oracle 生成的评价指标
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-07-25 DOI: 10.1109/TSE.2024.3433463
Jiho Shin;Hadi Hemmati;Moshi Wei;Song Wang
{"title":"Assessing Evaluation Metrics for Neural Test Oracle Generation","authors":"Jiho Shin;Hadi Hemmati;Moshi Wei;Song Wang","doi":"10.1109/TSE.2024.3433463","DOIUrl":"10.1109/TSE.2024.3433463","url":null,"abstract":"Recently, deep learning models have shown promising results in test oracle generation. Neural Oracle Generation (NOG) models are commonly evaluated using static (automatic) metrics which are mainly based on textual similarity of the output, e.g. BLEU, ROUGE-L, METEOR, and Accuracy. However, these textual similarity metrics may not reflect the testing effectiveness of the generated oracle within a test suite, which is often measured by dynamic (execution-based) test adequacy metrics such as code coverage and mutation score. In this work, we revisit existing oracle generation studies plus \u0000<italic>gpt-3.5</i>\u0000 to empirically investigate the current standing of their performance in textual similarity and test adequacy metrics. Specifically, we train and run four state-of-the-art test oracle generation models on seven textual similarity and two test adequacy metrics for our analysis. We apply two different correlation analyses between these two different sets of metrics. Surprisingly, we found no significant correlation between the textual similarity metrics and test adequacy metrics. For instance, \u0000<italic>gpt-3.5</i>\u0000 on the \u0000<italic>jackrabbit-oak</i>\u0000 project had the highest performance on all seven textual similarity metrics among the studied NOGs. However, it had the lowest test adequacy metrics compared to all the studied NOGs. We further conducted a qualitative analysis to explore the reasons behind our observations. We found that oracles with high textual similarity metrics but low test adequacy metrics tend to have complex or multiple chained method invocations within the oracle's parameters, making them hard for the model to generate completely, affecting the test adequacy metrics. On the other hand, oracles with low textual similarity metrics but high test adequacy metrics tend to have to call different assertion types or a different method that functions similarly to the ones in the ground truth. Overall, this work complements prior studies on test oracle generation with an extensive performance evaluation on textual similarity and test adequacy metrics and provides guidelines for better assessment of deep learning applications in software test generation in the future.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2337-2349"},"PeriodicalIF":6.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141764147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supersonic: Learning to Generate Source Code Optimizations in C/C++ 超级:学习用 C/C++ 生成优化源代码
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-07-22 DOI: 10.1109/TSE.2024.3423769
Zimin Chen;Sen Fang;Martin Monperrus
{"title":"Supersonic: Learning to Generate Source Code Optimizations in C/C++","authors":"Zimin Chen;Sen Fang;Martin Monperrus","doi":"10.1109/TSE.2024.3423769","DOIUrl":"10.1109/TSE.2024.3423769","url":null,"abstract":"Software optimization refines programs for resource efficiency while preserving functionality. Traditionally, it is a process done by developers and compilers. This paper introduces a third option, automated optimization at the source code level. We present \u0000<small>Supersonic</small>\u0000, a neural approach targeting minor source code modifications for optimization. Using a seq2seq model, \u0000<small>Supersonic</small>\u0000 is trained on C/C++ program pairs (\u0000<inline-formula><tex-math>$x_{t}$</tex-math></inline-formula>\u0000, \u0000<inline-formula><tex-math>$x_{t+1}$</tex-math></inline-formula>\u0000), where \u0000<inline-formula><tex-math>$x_{t+1}$</tex-math></inline-formula>\u0000 is an optimized version of \u0000<inline-formula><tex-math>$x_{t}$</tex-math></inline-formula>\u0000, and outputs a diff. \u0000<small>Supersonic</small>\u0000's performance is benchmarked against OpenAI's GPT-3.5-Turbo and GPT-4 on competitive programming tasks. The experiments show that \u0000<small>Supersonic</small>\u0000 not only outperforms both models on the code optimization task but also minimizes the extent of the change with a model more than 600x smaller than GPT-3.5-Turbo and 3700x smaller than GPT-4.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2849-2864"},"PeriodicalIF":6.5,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10606318","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141754778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enforcing Correctness of Collaborative Business Processes Using Plans 使用计划确保协作业务流程的正确性
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-07-22 DOI: 10.1109/TSE.2024.3431585
Qi Mo;Jianeng Wang;Zhongwen Xie;Cong Liu;Fei Dai
{"title":"Enforcing Correctness of Collaborative Business Processes Using Plans","authors":"Qi Mo;Jianeng Wang;Zhongwen Xie;Cong Liu;Fei Dai","doi":"10.1109/TSE.2024.3431585","DOIUrl":"10.1109/TSE.2024.3431585","url":null,"abstract":"Generally, a collaborative business process is a distributed process, in which a set of parallel business processes are involved. These business processes have complementary competencies and knowledge, and cooperate with each other to achieve their common business goals. To ensure the correctness of collaborative business processes, we propose a novel plan-based correctness enforcement approach in this article, which is privacy-preserving, available and efficient. This approach first requires participating organizations to define their business processes. Then, each participating organization employs a set of reduction rules to build the public process of its business process, in which all internal private activities and the flows formed by them are removed. Next, a set of correct plans is generated from these public processes. A plan is essentially a process fragment without alternative routings. From the external perspective (i.e., ignoring all internal private activities and the flows formed by them), a parallel execution of the business processes corresponding to these public processes follows only one such plan. Lastly, each participating organization independently refactors its business process using these resulting correct plans. Using the message places (corresponding to the actual communication interfaces), these refactored processes are composed in parallel. Thus, a correct and loosely coupled enforced process is constructed. This approach is evaluated on actual collaborative business processes, and the experimental results show that compared with state-of-the-art enforcement proposals, it can achieve correctness enforcement while protecting the business privacy of organizations and is available. Meanwhile, it is also more efficient and scalable, even a collaborative business process with tens of millions of states can be enforced within a few seconds.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2313-2336"},"PeriodicalIF":6.5,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141754780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation 基于 LLM 的测试驱动型交互代码生成:用户研究与经验评估
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-07-22 DOI: 10.1109/TSE.2024.3428972
Sarah Fakhoury;Aaditya Naik;Georgios Sakkas;Saikat Chakraborty;Shuvendu K. Lahiri
{"title":"LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation","authors":"Sarah Fakhoury;Aaditya Naik;Georgios Sakkas;Saikat Chakraborty;Shuvendu K. Lahiri","doi":"10.1109/TSE.2024.3428972","DOIUrl":"10.1109/TSE.2024.3428972","url":null,"abstract":"Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow \u0000<sc>TiCoder</small>\u0000 for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate code suggestions. Through a mixed methods user study with 15 programmers, we present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We find that participants using the proposed workflow are significantly more likely to correctly evaluate AI generated code, and report significantly less task-induced cognitive load. Furthermore, we test the potential of the workflow at scale with four different state-of-the-art LLMs on two python datasets, using an idealized proxy for a user feedback. We observe an average absolute improvement of 45.97% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions, in addition to the automatic generation of accompanying unit tests.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2254-2268"},"PeriodicalIF":6.5,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141754777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FairBalance: How to Achieve Equalized Odds With Data Pre-Processing 公平均衡:如何通过数据预处理实现胜负均等
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2024-07-22 DOI: 10.1109/TSE.2024.3431445
Zhe Yu;Joymallya Chakraborty;Tim Menzies
{"title":"FairBalance: How to Achieve Equalized Odds With Data Pre-Processing","authors":"Zhe Yu;Joymallya Chakraborty;Tim Menzies","doi":"10.1109/TSE.2024.3431445","DOIUrl":"10.1109/TSE.2024.3431445","url":null,"abstract":"This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive demographic groups—satisfying equalized odds. Different from prior works which either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition; this work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at \u0000<uri>https://github.com/hil-se/FairBalance</uri>\u0000.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2294-2312"},"PeriodicalIF":6.5,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141754779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信