arXiv - CS - Software Engineering最新文献_第3页

Can GPT-O1 Kill All Bugs? GPT-O1 能杀死所有虫子吗？

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10033

Haichuan Hu, Ye Shang, Guolin Xu, Congqing He, Quanjun Zhang

引用次数: 0

Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA 测试和环境复杂性会增加缺陷吗？SAP HANA 的实证研究

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10062

Alexander Berndt, Thomas Bach, Sebastian Baltes

{"title":"Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA","authors":"Alexander Berndt, Thomas Bach, Sebastian Baltes","doi":"arxiv-2409.10062","DOIUrl":"https://doi.org/arxiv-2409.10062","url":null,"abstract":"Background: Test flakiness is a major problem in the software industry. Flaky\u0000tests fail seemingly at random without changes to the code and thus impede\u0000continuous integration (CI). Some researchers argue that all tests can be\u0000considered flaky and that tests only differ in their frequency of flaky\u0000failures. Aims: With the goal of developing mitigation strategies to reduce the\u0000negative impact of test flakiness, we study characteristics of tests and the\u0000test environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a\u000012-week period: one based on production data, the other based on targeted test\u0000executions from a dedicated flakiness experiment. We conduct correlation\u0000analysis for test and test environment characteristics with respect to their\u0000influence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest\u0000positive correlation with the test flakiness rate (r = 0.79), which confirms\u0000previous studies. Potential reasons for higher flakiness include the larger\u0000test scope of long-running tests or test executions on a slower test\u0000infrastructure. Interestingly, the load on the testing infrastructure was not\u0000correlated with test flakiness. The relationship between test flakiness and\u0000required resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running\u0000tests can be an important measure for practitioners to cope with test\u0000flakiness, as it enables parallelization of test executions and also reduces\u0000the cost of re-executions. This effectively decreases the negative effects of\u0000test flakiness in complex testing environments. However, when splitting\u0000long-running tests, practitioners need to consider the potential test setup\u0000overhead of test splits.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LeGEND: A Top-Down Approach to Scenario Generation of Autonomous Driving Systems Assisted by Large Language Models LeGEND：在大型语言模型辅助下自上而下生成自动驾驶系统场景的方法

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10066

Shuncheng Tang, Zhenya Zhang, Jixiang Zhou, Lei Lei, Yuan Zhou, Yinxing Xue

{"title":"LeGEND: A Top-Down Approach to Scenario Generation of Autonomous Driving Systems Assisted by Large Language Models","authors":"Shuncheng Tang, Zhenya Zhang, Jixiang Zhou, Lei Lei, Yuan Zhou, Yinxing Xue","doi":"arxiv-2409.10066","DOIUrl":"https://doi.org/arxiv-2409.10066","url":null,"abstract":"Autonomous driving systems (ADS) are safety-critical and require\u0000comprehensive testing before their deployment on public roads. While existing\u0000testing approaches primarily aim at the criticality of scenarios, they often\u0000overlook the diversity of the generated scenarios that is also important to\u0000reflect system defects in different aspects. To bridge the gap, we propose\u0000LeGEND, that features a top-down fashion of scenario generation: it starts with\u0000abstract functional scenarios, and then steps downwards to logical and concrete\u0000scenarios, such that scenario diversity can be controlled at the functional\u0000level. However, unlike logical scenarios that can be formally described,\u0000functional scenarios are often documented in natural languages (e.g., accident\u0000reports) and thus cannot be precisely parsed and processed by computers. To\u0000tackle that issue, LeGEND leverages the recent advances of large language\u0000models (LLMs) to transform textual functional scenarios to formal logical\u0000scenarios. To mitigate the distraction of useless information in functional\u0000scenario description, we devise a two-phase transformation that features the\u0000use of an intermediate language; consequently, we adopt two LLMs in LeGEND, one\u0000for extracting information from functional scenarios, the other for converting\u0000the extracted information to formal logical scenarios. We experimentally\u0000evaluate LeGEND on Apollo, an industry-grade ADS from Baidu. Evaluation results\u0000show that LeGEND can effectively identify critical scenarios, and compared to\u0000baseline approaches, LeGEND exhibits evident superiority in diversity of\u0000generated scenarios. Moreover, we also demonstrate the advantages of our\u0000two-phase transformation framework, and the accuracy of the adopted LLMs.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the Impact of Code Comment Inconsistency on Bug Introducing 调查代码注释不一致对错误引入的影响

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10781

Shiva Radmanesh, Aaron Imani, Iftekhar Ahmed, Mohammad Moshirpour

{"title":"Investigating the Impact of Code Comment Inconsistency on Bug Introducing","authors":"Shiva Radmanesh, Aaron Imani, Iftekhar Ahmed, Mohammad Moshirpour","doi":"arxiv-2409.10781","DOIUrl":"https://doi.org/arxiv-2409.10781","url":null,"abstract":"Code comments are essential for clarifying code functionality, improving\u0000readability, and facilitating collaboration among developers. Despite their\u0000importance, comments often become outdated, leading to inconsistencies with the\u0000corresponding code. This can mislead developers and potentially introduce bugs.\u0000Our research investigates the impact of code-comment inconsistency on bug\u0000introduction using large language models, specifically GPT-3.5. We first\u0000compare the performance of the GPT-3.5 model with other state-of-the-art\u0000methods in detecting these inconsistencies, demonstrating the superiority of\u0000GPT-3.5 in this domain. Additionally, we analyze the temporal evolution of\u0000code-comment inconsistencies and their effect on bug proneness over various\u0000timeframes using GPT-3.5 and Odds ratio analysis. Our findings reveal that\u0000inconsistent changes are around 1.5 times more likely to lead to a\u0000bug-introducing commit than consistent changes, highlighting the necessity of\u0000maintaining consistent and up-to-date comments in software development. This\u0000study provides new insights into the relationship between code-comment\u0000inconsistency and software quality, offering a comprehensive analysis of its\u0000impact over time, demonstrating that the impact of code-comment inconsistency\u0000on bug introduction is highest immediately after the inconsistency is\u0000introduced and diminishes over time.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Large-Scale Privacy Assessment of Android Third-Party SDKs 大规模安卓第三方 SDK 隐私评估

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10411

Mark Huasong Meng, Chuan Yan, Yun Hao, Qing Zhang, Zeyu Wang, Kailong Wang, Sin Gee Teo, Guangdong Bai, Jin Song Dong

{"title":"A Large-Scale Privacy Assessment of Android Third-Party SDKs","authors":"Mark Huasong Meng, Chuan Yan, Yun Hao, Qing Zhang, Zeyu Wang, Kailong Wang, Sin Gee Teo, Guangdong Bai, Jin Song Dong","doi":"arxiv-2409.10411","DOIUrl":"https://doi.org/arxiv-2409.10411","url":null,"abstract":"Third-party Software Development Kits (SDKs) are widely adopted in Android\u0000app development, to effortlessly accelerate development pipelines and enhance\u0000app functionality. However, this convenience raises substantial concerns about\u0000unauthorized access to users' privacy-sensitive information, which could be\u0000further abused for illegitimate purposes like user tracking or monetization.\u0000Our study offers a targeted analysis of user privacy protection among Android\u0000third-party SDKs, filling a critical gap in the Android software supply chain.\u0000It focuses on two aspects of their privacy practices, including data\u0000exfiltration and behavior-policy compliance (or privacy compliance), utilizing\u0000techniques of taint analysis and large language models. It covers 158\u0000widely-used SDKs from two key SDK release platforms, the official one and a\u0000large alternative one. From them, we identified 338 instances of privacy data\u0000exfiltration. On the privacy compliance, our study reveals that more than 30%\u0000of the examined SDKs fail to provide a privacy policy to disclose their data\u0000handling practices. Among those that provide privacy policies, 37% of them\u0000over-collect user data, and 88% falsely claim access to sensitive data. We\u0000revisit the latest versions of the SDKs after 12 months. Our analysis\u0000demonstrates a persistent lack of improvement in these concerning trends. Based\u0000on our findings, we propose three actionable recommendations to mitigate the\u0000privacy leakage risks and enhance privacy protection for Android users. Our\u0000research not only serves as an urgent call for industry attention but also\u0000provides crucial insights for future regulatory interventions.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code ComplexCodeEval：在更复杂代码上评估大型代码模型的基准

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10280

Jia Feng, Jiachen Liu, Cuiyun Gao, Chun Yong Chong, Chaozheng Wang, Shan Gao, Xin Xia

引用次数: 0

Centralization potential of automotive E/E architectures 汽车 E/E 架构的集中化潜力

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10690

Lucas Mauser, Stefan Wagner

{"title":"Centralization potential of automotive E/E architectures","authors":"Lucas Mauser, Stefan Wagner","doi":"arxiv-2409.10690","DOIUrl":"https://doi.org/arxiv-2409.10690","url":null,"abstract":"Current automotive E/E architectures are subject to significant\u0000transformations: Computing-power-intensive advanced driver-assistance systems,\u0000bandwidth-hungry infotainment systems, the connection of the vehicle with the\u0000internet and the consequential need for cyber-security drives the\u0000centralization of E/E architectures. A centralized architecture is often seen\u0000as a key enabler to master those challenges. Available research focuses mostly\u0000on the different types of E/E architectures and contrasts their advantages and\u0000disadvantages. There is a research gap on guidelines for system designers and\u0000function developers to analyze the potential of their systems for\u0000centralization. The present paper aims to quantify centralization potential\u0000reviewing relevant literature and conducting qualitative interviews with\u0000industry practitioners. In literature, we identified seven key automotive\u0000system properties reaching limitations in current automotive architectures:\u0000busload, functional safety, computing power, feature dependencies, development\u0000and maintenance costs, error rate, modularity and flexibility. These properties\u0000serve as quantitative evaluation criteria to estimate whether centralization\u0000would enhance overall system performance. In the interviews, we have validated\u0000centralization and its fundament - the conceptual systems engineering - as\u0000capabilities to mitigate these limitations. By focusing on practical insights\u0000and lessons learned, this research provides system designers with actionable\u0000guidance to optimize their systems, addressing the outlined challenges while\u0000avoiding monolithic architecture. This paper bridges the gap between\u0000theoretical research and practical application, offering valuable takeaways for\u0000practitioners.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face 基于拥抱表情的开放式预训练语言模型的语义版本发布

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10472

Adekunle Ajibode, Abdul Ali Bangash, Filipe Roseiro Cogo, Bram Adams, Ahmed E. Hassan

{"title":"Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face","authors":"Adekunle Ajibode, Abdul Ali Bangash, Filipe Roseiro Cogo, Bram Adams, Ahmed E. Hassan","doi":"arxiv-2409.10472","DOIUrl":"https://doi.org/arxiv-2409.10472","url":null,"abstract":"The proliferation of open Pre-trained Language Models (PTLMs) on model\u0000registry platforms like Hugging Face (HF) presents both opportunities and\u0000challenges for companies building products around them. Similar to traditional\u0000software dependencies, PTLMs continue to evolve after a release. However, the\u0000current state of release practices of PTLMs on model registry platforms are\u0000plagued by a variety of inconsistencies, such as ambiguous naming conventions\u0000and inaccessible model training documentation. Given the knowledge gap on\u0000current PTLM release practices, our empirical study uses a mixed-methods\u0000approach to analyze the releases of 52,227 PTLMs on the most well-known model\u0000registry, HF. Our results reveal 148 different naming practices for PTLM\u0000releases, with 40.87% of changes to model weight files not represented in the\u0000adopted name-based versioning practice or their documentation. In addition, we\u0000identified that the 52,227 PTLMs are derived from only 299 different base\u0000models (the modified original models used to create 52,227 PTLMs), with\u0000Fine-tuning and Quantization being the most prevalent modification methods\u0000applied to these base models. Significant gaps in release transparency, in\u0000terms of training dataset specifications and model card availability, still\u0000exist, highlighting the need for standardized documentation. While we\u0000identified a model naming practice explicitly differentiating between major and\u0000minor PTLM releases, we did not find any significant difference in the types of\u0000changes that went into either type of releases, suggesting that major/minor\u0000version numbers for PTLMs often are chosen arbitrarily. Our findings provide\u0000valuable insights to improve PTLM release practices, nudging the field towards\u0000more formal semantic versioning practices.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding Code Change with Micro-Changes 通过微变化了解代码变化

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.09923

Lei Chen, Michele Lanza, Shinpei Hayashi

{"title":"Understanding Code Change with Micro-Changes","authors":"Lei Chen, Michele Lanza, Shinpei Hayashi","doi":"arxiv-2409.09923","DOIUrl":"https://doi.org/arxiv-2409.09923","url":null,"abstract":"A crucial activity in software maintenance and evolution is the comprehension\u0000of the changes performed by developers, when they submit a pull request and/or\u0000perform a commit on the repository. Typically, code changes are represented in\u0000the form of code diffs, textual representations highlighting the differences\u0000between two file versions, depicting the added, removed, and changed lines.\u0000This simplistic representation must be interpreted by developers, and mentally\u0000lifted to a higher abstraction level, that more closely resembles natural\u0000language descriptions, and eases the creation of a mental model of the changes.\u0000However, the textual diff-based representation is cumbersome, and the lifting\u0000requires considerable domain knowledge and programming skills. We present an\u0000approach, based on the concept of micro-change, to overcome these difficulties,\u0000translating code diffs into a series of pre-defined change operations, which\u0000can be described in natural language. We present a catalog of micro-changes,\u0000together with an automated micro-change detector. To evaluate our approach, we\u0000performed an empirical study on a large set of open-source repositories,\u0000focusing on a subset of our micro-change catalog, namely those related to\u0000changes affecting the conditional logic. We found that our detector is capable\u0000of explaining more than 67% of the changes taking place in the systems under\u0000study.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models 使用大型语言模型进行上下文感知代码分割以实现 C 到 Rust 翻译

arXiv - CS - Software Engineering Pub Date : 2024-09-16 DOI: arxiv-2409.10506

Momoko Shiraishi, Takahiro Shinagawa

{"title":"Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models","authors":"Momoko Shiraishi, Takahiro Shinagawa","doi":"arxiv-2409.10506","DOIUrl":"https://doi.org/arxiv-2409.10506","url":null,"abstract":"There is strong motivation to translate C code into Rust code due to the\u0000continuing threat of memory safety vulnerabilities in existing C programs and\u0000the significant attention paid to Rust as an alternative to the C language.\u0000While large language models (LLMs) show promise for automating this translation\u0000by generating more natural and safer code than rule-based methods, previous\u0000studies have shown that LLM-generated Rust code often fails to compile, even\u0000for relatively small C programs, due to significant differences between the two\u0000languages and context window limitations. We propose an LLM-based translation\u0000scheme that improves the success rate of translating large-scale C code into\u0000compilable Rust code. Our approach involves three key techniques: (1)\u0000pre-processing the C code to better align its structure and expressions with\u0000Rust, (2) segmenting the code into optimally sized translation units to avoid\u0000exceeding the LLM's context window limits, and (3) iteratively compiling and\u0000repairing errors while maintaining consistency between translation units using\u0000context-supplementing prompts. Compilation success is an essential first step\u0000in achieving functional equivalence, as only compilable code can be further\u0000tested. In experiments with 20 benchmark C programs, including those exceeding\u00004 kilo lines of code, we successfully translated all programs into compilable\u0000Rust code without losing corresponding parts of the original code.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0