arXiv - CS - Software Engineering最新文献_第7页

Exploring Accessibility Trends and Challenges in Mobile App Development: A Study of Stack Overflow Questions 探索移动应用程序开发中的无障碍趋势和挑战：Stack Overflow 问题研究

arXiv - CS - Software Engineering Pub Date : 2024-09-12 DOI: arxiv-2409.07945

Amila Indika, Christopher Lee, Haochen Wang, Justin Lisoway, Anthony Peruma, Rick Kazman

引用次数: 0

Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat 利用基于检索的 LLM 实现经济高效的用户界面自动化测试：微信案例研究

arXiv - CS - Software Engineering Pub Date : 2024-09-12 DOI: arxiv-2409.07829

Sidong Feng, Haochuan Lu, Jianqin Jiang, Ting Xiong, Likun Huang, Yinglin Liang, Xiaoqin Li, Yuetang Deng, Aldeida Aleti

{"title":"Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat","authors":"Sidong Feng, Haochuan Lu, Jianqin Jiang, Ting Xiong, Likun Huang, Yinglin Liang, Xiaoqin Li, Yuetang Deng, Aldeida Aleti","doi":"arxiv-2409.07829","DOIUrl":"https://doi.org/arxiv-2409.07829","url":null,"abstract":"UI automation tests play a crucial role in ensuring the quality of mobile\u0000applications. Despite the growing popularity of machine learning techniques to\u0000generate these tests, they still face several challenges, such as the mismatch\u0000of UI elements. The recent advances in Large Language Models (LLMs) have\u0000addressed these issues by leveraging their semantic understanding capabilities.\u0000However, a significant gap remains in applying these models to industrial-level\u0000app testing, particularly in terms of cost optimization and knowledge\u0000limitation. To address this, we introduce CAT to create cost-effective UI\u0000automation tests for industry apps by combining machine learning and LLMs with\u0000best practices. Given the task description, CAT employs Retrieval Augmented\u0000Generation (RAG) to source examples of industrial app usage as the few-shot\u0000learning context, assisting LLMs in generating the specific sequence of\u0000actions. CAT then employs machine learning techniques, with LLMs serving as a\u0000complementary optimizer, to map the target element on the UI screen. Our\u0000evaluations on the WeChat testing dataset demonstrate the CAT's performance and\u0000cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming\u0000the state-of-the-art. We have also integrated our approach into the real-world\u0000WeChat testing platform, demonstrating its usefulness in detecting 141 bugs and\u0000enhancing the developers' testing process.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dividable Configuration Performance Learning 可分割配置性能学习

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI: arxiv-2409.07629

Jingzhi Gong, Tao Chen, Rami Bahsoon

{"title":"Dividable Configuration Performance Learning","authors":"Jingzhi Gong, Tao Chen, Rami Bahsoon","doi":"arxiv-2409.07629","DOIUrl":"https://doi.org/arxiv-2409.07629","url":null,"abstract":"Machine/deep learning models have been widely adopted for predicting the\u0000configuration performance of software systems. However, a crucial yet\u0000unaddressed challenge is how to cater for the sparsity inherited from the\u0000configuration landscape: the influence of configuration options (features) and\u0000the distribution of data samples are highly sparse. In this paper, we propose a\u0000model-agnostic and sparsity-robust framework for predicting configuration\u0000performance, dubbed DaL, based on the new paradigm of dividable learning that\u0000builds a model via \"divide-and-learn\". To handle sample sparsity, the samples\u0000from the configuration landscape are divided into distant divisions, for each\u0000of which we build a sparse local model, e.g., regularized Hierarchical\u0000Interaction Neural Network, to deal with the feature sparsity. A newly given\u0000configuration would then be assigned to the right model of division for the\u0000final prediction. Further, DaL adaptively determines the optimal number of\u0000divisions required for a system and sample size without any extra training or\u0000profiling. Experiment results from 12 real-world systems and five sets of\u0000training data reveal that, compared with the state-of-the-art approaches, DaL\u0000performs no worse than the best counterpart on 44 out of 60 cases with up to\u00001.61x improvement on accuracy; requires fewer samples to reach the same/better\u0000accuracy; and producing acceptable training overhead. In particular, the\u0000mechanism that adapted the parameter d can reach the optimal value for 76.43%\u0000of the individual runs. The result also confirms that the paradigm of dividable\u0000learning is more suitable than other similar paradigms such as ensemble\u0000learning for predicting configuration performance. Practically, DaL\u0000considerably improves different global models when using them as the underlying\u0000local models, which further strengthens its flexibility.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reusability and Modifiability in Robotics Software (Extended Version) 机器人软件的可重用性和可修改性（扩展版）

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI: arxiv-2409.07228

Laura Pomponio, Maximiliano Cristiá, Estanislao Ruiz Sorazábal, Maximiliano García

引用次数: 0

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories SUPER：评估代理从研究资料库中设置和执行任务的能力

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI: arxiv-2409.07440

Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot

{"title":"SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories","authors":"Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot","doi":"arxiv-2409.07440","DOIUrl":"https://doi.org/arxiv-2409.07440","url":null,"abstract":"Given that Large Language Models (LLMs) have made significant progress in\u0000writing code, can they now be used to autonomously reproduce results from\u0000research repositories? Such a capability would be a boon to the research\u0000community, helping researchers validate, understand, and extend prior work. To\u0000advance towards this goal, we introduce SUPER, the first benchmark designed to\u0000evaluate the capability of LLMs in setting up and executing tasks from research\u0000repositories. SUPERaims to capture the realistic challenges faced by\u0000researchers working with Machine Learning (ML) and Natural Language Processing\u0000(NLP) research repositories. Our benchmark comprises three distinct problem\u0000sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems\u0000derived from the expert set that focus on specific challenges (e.g.,\u0000configuring a trainer), and 602 automatically generated problems for\u0000larger-scale development. We introduce various evaluation measures to assess\u0000both task success and progress, utilizing gold solutions when available or\u0000approximations otherwise. We show that state-of-the-art approaches struggle to\u0000solve these problems with the best model (GPT-4o) solving only 16.3% of the\u0000end-to-end set, and 46.1% of the scenarios. This illustrates the challenge of\u0000this task, and suggests that SUPER can serve as a valuable resource for the\u0000community to make and measure progress.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study 使用大型语言模型对应用程序评论进行细粒度情感分析：评估研究

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI: arxiv-2409.07162

Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma

{"title":"A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study","authors":"Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma","doi":"arxiv-2409.07162","DOIUrl":"https://doi.org/arxiv-2409.07162","url":null,"abstract":"Analyzing user reviews for sentiment towards app features can provide\u0000valuable insights into users' perceptions of app functionality and their\u0000evolving needs. Given the volume of user reviews received daily, an automated\u0000mechanism to generate feature-level sentiment summaries of user reviews is\u0000needed. Recent advances in Large Language Models (LLMs) such as ChatGPT have\u0000shown impressive performance on several new tasks without updating the model's\u0000parameters i.e. using zero or a few labeled examples. Despite these\u0000advancements, LLMs' capabilities to perform feature-specific sentiment analysis\u0000of user reviews remain unexplored. This study compares the performance of\u0000state-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for\u0000extracting app features and associated sentiments under 0-shot, 1-shot, and\u00005-shot scenarios. Results indicate the best-performing GPT-4 model outperforms\u0000rule-based approaches by 23.6% in f1-score with zero-shot feature extraction;\u00005-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting\u0000positive sentiment towards correctly predicted app features, with 5-shot\u0000enhancing it by 7%. Our study suggests that LLM models are promising for\u0000generating feature-specific sentiment summaries of user reviews.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GitSEED: A Git-backed Automated Assessment Tool for Software Engineering and Programming Education GitSEED：支持 Git 的软件工程和编程教育自动评估工具

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI: arxiv-2409.07362

Pedro Orvalho, Mikoláš Janota, Vasco Manquinho

{"title":"GitSEED: A Git-backed Automated Assessment Tool for Software Engineering and Programming Education","authors":"Pedro Orvalho, Mikoláš Janota, Vasco Manquinho","doi":"arxiv-2409.07362","DOIUrl":"https://doi.org/arxiv-2409.07362","url":null,"abstract":"Due to the substantial number of enrollments in programming courses, a key\u0000challenge is delivering personalized feedback to students. The nature of this\u0000feedback varies significantly, contingent on the subject and the chosen\u0000evaluation method. However, tailoring current Automated Assessment Tools (AATs)\u0000to integrate other program analysis tools is not straightforward. Moreover,\u0000AATs usually support only specific programming languages, providing feedback\u0000exclusively through dedicated websites based on test suites. This paper introduces GitSEED, a language-agnostic automated assessment tool\u0000designed for Programming Education and Software Engineering (SE) and backed by\u0000GitLab. The students interact with GitSEED through GitLab. Using GitSEED,\u0000students in Computer Science (CS) and SE can master the fundamentals of git\u0000while receiving personalized feedback on their programming assignments and\u0000projects. Furthermore, faculty members can easily tailor GitSEED's pipeline by\u0000integrating various code evaluation tools (e.g., memory leak detection, fault\u0000localization, program repair, etc.) to offer personalized feedback that aligns\u0000with the needs of each CS/SE course. Our experiments assess GitSEED's efficacy\u0000via comprehensive user evaluation, examining the impact of feedback mechanisms\u0000and features on student learning outcomes. Findings reveal positive\u0000correlations between GitSEED usage and student engagement.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How Mature is Requirements Engineering for AI-based Systems? A Systematic Mapping Study on Practices, Challenges, and Future Research Directions 基于人工智能的系统的需求工程成熟度如何？关于实践、挑战和未来研究方向的系统映射研究

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI: arxiv-2409.07192

Umm-e- Habiba, Markus Haug, Justus Bogner, Stefan Wagner

{"title":"How Mature is Requirements Engineering for AI-based Systems? A Systematic Mapping Study on Practices, Challenges, and Future Research Directions","authors":"Umm-e- Habiba, Markus Haug, Justus Bogner, Stefan Wagner","doi":"arxiv-2409.07192","DOIUrl":"https://doi.org/arxiv-2409.07192","url":null,"abstract":"Artificial intelligence (AI) permeates all fields of life, which resulted in\u0000new challenges in requirements engineering for artificial intelligence (RE4AI),\u0000e.g., the difficulty in specifying and validating requirements for AI or\u0000considering new quality requirements due to emerging ethical implications. It\u0000is currently unclear if existing RE methods are sufficient or if new ones are\u0000needed to address these challenges. Therefore, our goal is to provide a\u0000comprehensive overview of RE4AI to researchers and practitioners. What has been\u0000achieved so far, i.e., what practices are available, and what research gaps and\u0000challenges still need to be addressed? To achieve this, we conducted a\u0000systematic mapping study combining query string search and extensive\u0000snowballing. The extracted data was aggregated, and results were synthesized\u0000using thematic analysis. Our selection process led to the inclusion of 126\u0000primary studies. Existing RE4AI research focuses mainly on requirements\u0000analysis and elicitation, with most practices applied in these areas.\u0000Furthermore, we identified requirements specification, explainability, and the\u0000gap between machine learning engineers and end-users as the most prevalent\u0000challenges, along with a few others. Additionally, we proposed seven potential\u0000research directions to address these challenges. Practitioners can use our\u0000results to identify and select suitable RE methods for working on their\u0000AI-based systems, while researchers can build on the identified gaps and\u0000research directions to push the field forward.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"235 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Regulatory Requirements Engineering in Large Enterprises: An Interview Study on the European Accessibility Act 大型企业的法规要求工程：欧洲无障碍法案访谈研究

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI: arxiv-2409.07313

Oleksandr Kosenkov, Michael Unterkalmsteiner, Daniel Mendez, Jannik Fischbach

{"title":"Regulatory Requirements Engineering in Large Enterprises: An Interview Study on the European Accessibility Act","authors":"Oleksandr Kosenkov, Michael Unterkalmsteiner, Daniel Mendez, Jannik Fischbach","doi":"arxiv-2409.07313","DOIUrl":"https://doi.org/arxiv-2409.07313","url":null,"abstract":"Context: Regulations, such as the European Accessibility Act (EAA), impact\u0000the engineering of software products and services. Managing that impact while\u0000providing meaningful inputs to development teams is one of the emerging\u0000requirements engineering (RE) challenges. Problem: Enterprises conduct Regulatory Impact Analysis (RIA) to consider the\u0000effects of regulations on software products offered and formulate requirements\u0000at an enterprise level. Despite its practical relevance, we are unaware of any\u0000studies on this large-scale regulatory RE process. Methodology: We conducted an exploratory interview study of RIA in three\u0000large enterprises. We focused on how they conduct RIA, emphasizing\u0000cross-functional interactions, and using the EAA as an example. Results: RIA, as a regulatory RE process, is conducted to address the needs\u0000of executive management and central functions. It involves coordination between\u0000different functions and levels of enterprise hierarchy. Enterprises use\u0000artifacts to support interpretation and communication of the results of RIA.\u0000Challenges to RIA are mainly related to the execution of such coordination and\u0000managing the knowledge involved. Conclusion: RIA in large enterprises demands close coordination of multiple\u0000stakeholders and roles. Applying interpretation and compliance artifacts is one\u0000approach to support such coordination. However, there are no established\u0000practices for creating and managing such artifacts.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RePlay: a Recommendation Framework for Experimentation and Production Use RePlay：用于实验和生产的推荐框架

arXiv - CS - Software Engineering Pub Date : 2024-09-11 DOI: arxiv-2409.07272

Alexey Vasilev, Anna Volodkevich, Denis Kulandin, Tatiana Bysheva, Anton Klenitskiy

引用次数: 0