arXiv - CS - Software Engineering最新文献_第4页

eWAPA: An eBPF-based WASI Performance Analysis Framework for WebAssembly Runtimes eWAPA：基于 eBPF 的 WebAssembly 运行时 WASI 性能分析框架

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10252

Chenxi Mao, Yuxin Su, Shiwen Shan, Dan Li

{"title":"eWAPA: An eBPF-based WASI Performance Analysis Framework for WebAssembly Runtimes","authors":"Chenxi Mao, Yuxin Su, Shiwen Shan, Dan Li","doi":"arxiv-2409.10252","DOIUrl":"https://doi.org/arxiv-2409.10252","url":null,"abstract":"WebAssembly (Wasm) is a low-level bytecode format that can run in modern\u0000browsers. With the development of standalone runtimes and the improvement of\u0000the WebAssembly System Interface (WASI), Wasm has further provided a more\u0000complete sandboxed runtime experience for server-side applications, effectively\u0000expanding its application scenarios. However, the implementation of WASI varies\u0000across different runtimes, and suboptimal interface implementations can lead to\u0000performance degradation during interactions between the runtime and the\u0000operating system. Existing research mainly focuses on overall performance\u0000evaluation of runtimes, while studies on WASI implementations are relatively\u0000scarce. To tackle this problem, we propose an eBPF-based WASI performance\u0000analysis framework. It collects key performance metrics of the runtime under\u0000different I/O load conditions, such as total execution time, startup time, WASI\u0000execution time, and syscall time. We can comprehensively analyze the\u0000performance of the runtime's I/O interactions with the operating system.\u0000Additionally, we provide a detailed analysis of the causes behind two specific\u0000WASI performance anomalies. These analytical results will guide the\u0000optimization of standalone runtimes and WASI implementations, enhancing their\u0000efficiency.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NaviQAte: Functionality-Guided Web Application Navigation NaviQAte：功能引导型网络应用程序导航

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10741

Mobina Shahbandeh, Parsa Alian, Noor Nashid, Ali Mesbah

{"title":"NaviQAte: Functionality-Guided Web Application Navigation","authors":"Mobina Shahbandeh, Parsa Alian, Noor Nashid, Ali Mesbah","doi":"arxiv-2409.10741","DOIUrl":"https://doi.org/arxiv-2409.10741","url":null,"abstract":"End-to-end web testing is challenging due to the need to explore diverse web\u0000application functionalities. Current state-of-the-art methods, such as\u0000WebCanvas, are not designed for broad functionality exploration; they rely on\u0000specific, detailed task descriptions, limiting their adaptability in dynamic\u0000web environments. We introduce NaviQAte, which frames web application\u0000exploration as a question-and-answer task, generating action sequences for\u0000functionalities without requiring detailed parameters. Our three-phase approach\u0000utilizes advanced large language models like GPT-4o for complex decision-making\u0000and cost-effective models, such as GPT-4o mini, for simpler tasks. NaviQAte\u0000focuses on functionality-guided web application navigation, integrating\u0000multi-modal inputs such as text and images to enhance contextual understanding.\u0000Evaluations on the Mind2Web-Live and Mind2Web-Live-Abstracted datasets show\u0000that NaviQAte achieves a 44.23% success rate in user task navigation and a\u000038.46% success rate in functionality navigation, representing a 15% and 33%\u0000improvement over WebCanvas. These results underscore the effectiveness of our\u0000approach in advancing automated web application testing.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing AutoSafeCoder：通过静态分析和模糊测试确保 LLM 代码生成安全的多代理框架

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10737

Ana Nunez, Nafis Tanveer Islam, Sumit Kumar Jha, Peyman Najafirad

{"title":"AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing","authors":"Ana Nunez, Nafis Tanveer Islam, Sumit Kumar Jha, Peyman Najafirad","doi":"arxiv-2409.10737","DOIUrl":"https://doi.org/arxiv-2409.10737","url":null,"abstract":"Recent advancements in automatic code generation using large language models\u0000(LLMs) have brought us closer to fully automated secure software development.\u0000However, existing approaches often rely on a single agent for code generation,\u0000which struggles to produce secure, vulnerability-free code. Traditional program\u0000synthesis with LLMs has primarily focused on functional correctness, often\u0000neglecting critical dynamic security implications that happen during runtime.\u0000To address these challenges, we propose AutoSafeCoder, a multi-agent framework\u0000that leverages LLM-driven agents for code generation, vulnerability analysis,\u0000and security enhancement through continuous collaboration. The framework\u0000consists of three agents: a Coding Agent responsible for code generation, a\u0000Static Analyzer Agent identifying vulnerabilities, and a Fuzzing Agent\u0000performing dynamic testing using a mutation-based fuzzing approach to detect\u0000runtime errors. Our contribution focuses on ensuring the safety of multi-agent\u0000code generation by integrating dynamic and static testing in an iterative\u0000process during code generation by LLM that improves security. Experiments using\u0000the SecurityEval dataset demonstrate a 13% reduction in code vulnerabilities\u0000compared to baseline LLMs, with no compromise in functionality.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"99 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation RethinkMCTS：改进蒙特卡洛树搜索代码生成中的错误想法

arXiv - CS - Software Engineering Pub Date : 2024-09-15 DOI: arxiv-2409.09584

Qingyao Li, Wei Xia, Kounianhua Du, Xinyi Dai, Ruiming Tang, Yasheng Wang, Yong Yu, Weinan Zhang

{"title":"RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation","authors":"Qingyao Li, Wei Xia, Kounianhua Du, Xinyi Dai, Ruiming Tang, Yasheng Wang, Yong Yu, Weinan Zhang","doi":"arxiv-2409.09584","DOIUrl":"https://doi.org/arxiv-2409.09584","url":null,"abstract":"LLM agents enhanced by tree search algorithms have yielded notable\u0000performances in code generation. However, current search algorithms in this\u0000domain suffer from low search quality due to several reasons: 1) Ineffective\u0000design of the search space for the high-reasoning demands of code generation\u0000tasks, 2) Inadequate integration of code feedback with the search algorithm,\u0000and 3) Poor handling of negative feedback during the search, leading to reduced\u0000search efficiency and quality. To address these challenges, we propose to\u0000search for the reasoning process of the code and use the detailed feedback of\u0000code execution to refine erroneous thoughts during the search. In this paper,\u0000we introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS)\u0000algorithm to conduct thought-level searches before generating code, thereby\u0000exploring a wider range of strategies. More importantly, we construct verbal\u0000feedback from fine-grained code execution feedback to refine erroneous thoughts\u0000during the search. This ensures that the search progresses along the correct\u0000reasoning paths, thus improving the overall search quality of the tree by\u0000leveraging execution feedback. Through extensive experiments, we demonstrate\u0000that RethinkMCTS outperforms previous search-based and feedback-based code\u0000generation baselines. On the HumanEval dataset, it improves the pass@1 of\u0000GPT-3.5-turbo from 70.12 to 89.02 and GPT-4o-mini from 87.20 to 94.51. It\u0000effectively conducts more thorough exploration through thought-level searches\u0000and enhances the search quality of the entire tree by incorporating rethink\u0000operation.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"211 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts ContractTinker：为真实世界智能合约提供 LLM 驱动的漏洞修复功能

arXiv - CS - Software Engineering Pub Date : 2024-09-15 DOI: arxiv-2409.09661

Che Wang, Jiashuo Zhang, Jianbo Gao, Libin Xia, Zhi Guan, Zhong Chen

{"title":"ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts","authors":"Che Wang, Jiashuo Zhang, Jianbo Gao, Libin Xia, Zhi Guan, Zhong Chen","doi":"arxiv-2409.09661","DOIUrl":"https://doi.org/arxiv-2409.09661","url":null,"abstract":"Smart contracts are susceptible to being exploited by attackers, especially\u0000when facing real-world vulnerabilities. To mitigate this risk, developers often\u0000rely on third-party audit services to identify potential vulnerabilities before\u0000project deployment. Nevertheless, repairing the identified vulnerabilities is\u0000still complex and labor-intensive, particularly for developers lacking security\u0000expertise. Moreover, existing pattern-based repair tools mostly fail to address\u0000real-world vulnerabilities due to their lack of high-level semantic\u0000understanding. To fill this gap, we propose ContractTinker, a Large Language\u0000Models (LLMs)-empowered tool for real-world vulnerability repair. The key\u0000insight is our adoption of the Chain-of-Thought approach to break down the\u0000entire generation task into sub-tasks. Additionally, to reduce hallucination,\u0000we integrate program static analysis to guide the LLM. We evaluate\u0000ContractTinker on 48 high-risk vulnerabilities. The experimental results show\u0000that among the patches generated by ContractTinker, 23 (48%) are valid patches\u0000that fix the vulnerabilities, while 10 (21%) require only minor modifications.\u0000A video of ContractTinker is available at https://youtu.be/HWFVi-YHcPE.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Large Language Models for Predicting Cost and Duration in Software Engineering Projects 利用大型语言模型预测软件工程项目的成本和工期

arXiv - CS - Software Engineering Pub Date : 2024-09-15 DOI: arxiv-2409.09617

Justin Carpenter, Chia-Ying Wu, Nasir U. Eisty

{"title":"Leveraging Large Language Models for Predicting Cost and Duration in Software Engineering Projects","authors":"Justin Carpenter, Chia-Ying Wu, Nasir U. Eisty","doi":"arxiv-2409.09617","DOIUrl":"https://doi.org/arxiv-2409.09617","url":null,"abstract":"Accurate estimation of project costs and durations remains a pivotal\u0000challenge in software engineering, directly impacting budgeting and resource\u0000management. Traditional estimation techniques, although widely utilized, often\u0000fall short due to their complexity and the dynamic nature of software\u0000development projects. This study introduces an innovative approach using Large\u0000Language Models (LLMs) to enhance the accuracy and usability of project cost\u0000predictions. We explore the efficacy of LLMs against traditional methods and\u0000contemporary machine learning techniques, focusing on their potential to\u0000simplify the estimation process and provide higher accuracy. Our research is\u0000structured around critical inquiries into whether LLMs can outperform existing\u0000models, the ease of their integration into current practices, outperform\u0000traditional estimation, and why traditional methods still prevail in industry\u0000settings. By applying LLMs to a range of real-world datasets and comparing\u0000their performance to both state-of-the-art and conventional methods, this study\u0000aims to demonstrate that LLMs not only yield more accurate estimates but also\u0000offer a user-friendly alternative to complex predictive models, potentially\u0000transforming project management strategies within the software industry.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Overcoming linguistic barriers in code assistants: creating a QLoRA adapter to improve support for Russian-language code writing instructions 克服代码助手中的语言障碍：创建 QLoRA 适配器以改进对俄语代码编写说明的支持

arXiv - CS - Software Engineering Pub Date : 2024-09-14 DOI: arxiv-2409.09353

C. B. Pronin, A. V. Volosova, A. V. Ostroukh, Yu. N. Strogov

{"title":"Overcoming linguistic barriers in code assistants: creating a QLoRA adapter to improve support for Russian-language code writing instructions","authors":"C. B. Pronin, A. V. Volosova, A. V. Ostroukh, Yu. N. Strogov","doi":"arxiv-2409.09353","DOIUrl":"https://doi.org/arxiv-2409.09353","url":null,"abstract":"In this paper, an approach to training and evaluating an adapter model for\u0000the popular language model \"zephyr-7b-beta\" is described. The adapter was\u0000developed to improve the performance of the base model in tasks related to\u0000programming and understanding the Russian language. Considering the high\u0000quality of the original model in tasks in the English language, the goal of the\u0000research was to expand its linguistic and technical spectrum. The proposed\u0000adapter was trained using a large and diverse dataset, including\u0000question-answer pairs related to programming, as well code-related texts in\u0000Russian language. The applied training methodology ensures an improvement in\u0000the model's quality of answers in understanding and generating Python code\u0000based on Russian instructions. We evaluated the performance of the base model\u0000with the installed adapter using various metrics, comparing it to the base\u0000model as well as other state-of-the-art models in this field. The obtained\u0000results showed significant improvement, both in tasks related to writing Python\u0000code and in processing the Russian language, confirming the effectiveness of\u0000the proposed adapter.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing 我的模型出了什么问题？识别语义数据切分的系统性问题

arXiv - CS - Software Engineering Pub Date : 2024-09-14 DOI: arxiv-2409.09261

Chenyang Yang, Yining Hong, Grace A. Lewis, Tongshuang Wu, Christian Kästner

引用次数: 0

Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments 在工业环境中实现对开源软件供应链中毒攻击的稳健检测

arXiv - CS - Software Engineering Pub Date : 2024-09-14 DOI: arxiv-2409.09356

Xinyi Zheng, Chen Wei, Shenao Wang, Yanjie Zhao, Peiming Gao, Yuanchao Zhang, Kailong Wang, Haoyu Wang

{"title":"Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments","authors":"Xinyi Zheng, Chen Wei, Shenao Wang, Yanjie Zhao, Peiming Gao, Yuanchao Zhang, Kailong Wang, Haoyu Wang","doi":"arxiv-2409.09356","DOIUrl":"https://doi.org/arxiv-2409.09356","url":null,"abstract":"The exponential growth of open-source package ecosystems, particularly NPM\u0000and PyPI, has led to an alarming increase in software supply chain poisoning\u0000attacks. Existing static analysis methods struggle with high false positive\u0000rates and are easily thwarted by obfuscation and dynamic code execution\u0000techniques. While dynamic analysis approaches offer improvements, they often\u0000suffer from capturing non-package behaviors and employing simplistic testing\u0000strategies that fail to trigger sophisticated malicious behaviors. To address\u0000these challenges, we present OSCAR, a robust dynamic code poisoning detection\u0000pipeline for NPM and PyPI ecosystems. OSCAR fully executes packages in a\u0000sandbox environment, employs fuzz testing on exported functions and classes,\u0000and implements aspect-based behavior monitoring with tailored API hook points.\u0000We evaluate OSCAR against six existing tools using a comprehensive benchmark\u0000dataset of real-world malicious and benign packages. OSCAR achieves an F1 score\u0000of 0.95 in NPM and 0.91 in PyPI, confirming that OSCAR is as effective as the\u0000current state-of-the-art technologies. Furthermore, for benign packages\u0000exhibiting characteristics typical of malicious packages, OSCAR reduces the\u0000false positive rate by an average of 32.06% in NPM (from 34.63% to 2.57%) and\u000039.87% in PyPI (from 41.10% to 1.23%), compared to other tools, significantly\u0000reducing the workload of manual reviews in real-world deployments. In\u0000cooperation with Ant Group, a leading financial technology company, we have\u0000deployed OSCAR on its NPM and PyPI mirrors since January 2023, identifying\u000010,404 malicious NPM packages and 1,235 malicious PyPI packages over 18 months.\u0000This work not only bridges the gap between academic research and industrial\u0000application in code poisoning detection but also provides a robust and\u0000practical solution that has been thoroughly tested in a real-world industrial\u0000setting.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rethinking the Influence of Source Code on Test Case Generation 重新思考源代码对测试用例生成的影响

arXiv - CS - Software Engineering Pub Date : 2024-09-14 DOI: arxiv-2409.09464

Dong Huang, Jie M. Zhang, Mingzhe Du, Mark Harman, Heming Cui

{"title":"Rethinking the Influence of Source Code on Test Case Generation","authors":"Dong Huang, Jie M. Zhang, Mingzhe Du, Mark Harman, Heming Cui","doi":"arxiv-2409.09464","DOIUrl":"https://doi.org/arxiv-2409.09464","url":null,"abstract":"Large language models (LLMs) have been widely applied to assist test\u0000generation with the source code under test provided as the context. This paper\u0000aims to answer the question: If the source code under test is incorrect, will\u0000LLMs be misguided when generating tests? The effectiveness of test cases is\u0000measured by their accuracy, coverage, and bug detection effectiveness. Our\u0000evaluation results with five open- and six closed-source LLMs on four datasets\u0000demonstrate that incorrect code can significantly mislead LLMs in generating\u0000correct, high-coverage, and bug-revealing tests. For instance, in the HumanEval\u0000dataset, LLMs achieve 80.45% test accuracy when provided with task descriptions\u0000and correct code, but only 57.12% when given task descriptions and incorrect\u0000code. For the APPS dataset, prompts with correct code yield tests that detect\u000039.85% of the bugs, while prompts with incorrect code detect only 19.61%. These\u0000findings have important implications for the deployment of LLM-based testing:\u0000using it on mature code may help protect against future regression, but on\u0000early-stage immature code, it may simply bake in errors. Our findings also\u0000underscore the need for further research to improve LLMs resilience against\u0000incorrect code in generating reliable and bug-revealing tests.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0