{"title":"Building a Cybersecurity Risk Metamodel for Improved Method and Tool Integration","authors":"Christophe Ponsard","doi":"arxiv-2409.07906","DOIUrl":"https://doi.org/arxiv-2409.07906","url":null,"abstract":"Nowadays, companies are highly exposed to cyber security threats. In many\u0000industrial domains, protective measures are being deployed and actively\u0000supported by standards. However the global process remains largely dependent on\u0000document driven approach or partial modelling which impacts both the efficiency\u0000and effectiveness of the cybersecurity process from the risk analysis step. In\u0000this paper, we report on our experience in applying a model-driven approach on\u0000the initial risk analysis step in connection with a later security testing. Our\u0000work rely on a common metamodel which is used to map, synchronise and ensure\u0000information traceability across different tools. We validate our approach using\u0000different scenarios relying domain modelling, system modelling, risk assessment\u0000and security testing tools.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat","authors":"Sidong Feng, Haochuan Lu, Jianqin Jiang, Ting Xiong, Likun Huang, Yinglin Liang, Xiaoqin Li, Yuetang Deng, Aldeida Aleti","doi":"arxiv-2409.07829","DOIUrl":"https://doi.org/arxiv-2409.07829","url":null,"abstract":"UI automation tests play a crucial role in ensuring the quality of mobile\u0000applications. Despite the growing popularity of machine learning techniques to\u0000generate these tests, they still face several challenges, such as the mismatch\u0000of UI elements. The recent advances in Large Language Models (LLMs) have\u0000addressed these issues by leveraging their semantic understanding capabilities.\u0000However, a significant gap remains in applying these models to industrial-level\u0000app testing, particularly in terms of cost optimization and knowledge\u0000limitation. To address this, we introduce CAT to create cost-effective UI\u0000automation tests for industry apps by combining machine learning and LLMs with\u0000best practices. Given the task description, CAT employs Retrieval Augmented\u0000Generation (RAG) to source examples of industrial app usage as the few-shot\u0000learning context, assisting LLMs in generating the specific sequence of\u0000actions. CAT then employs machine learning techniques, with LLMs serving as a\u0000complementary optimizer, to map the target element on the UI screen. Our\u0000evaluations on the WeChat testing dataset demonstrate the CAT's performance and\u0000cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming\u0000the state-of-the-art. We have also integrated our approach into the real-world\u0000WeChat testing platform, demonstrating its usefulness in detecting 141 bugs and\u0000enhancing the developers' testing process.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dividable Configuration Performance Learning","authors":"Jingzhi Gong, Tao Chen, Rami Bahsoon","doi":"arxiv-2409.07629","DOIUrl":"https://doi.org/arxiv-2409.07629","url":null,"abstract":"Machine/deep learning models have been widely adopted for predicting the\u0000configuration performance of software systems. However, a crucial yet\u0000unaddressed challenge is how to cater for the sparsity inherited from the\u0000configuration landscape: the influence of configuration options (features) and\u0000the distribution of data samples are highly sparse. In this paper, we propose a\u0000model-agnostic and sparsity-robust framework for predicting configuration\u0000performance, dubbed DaL, based on the new paradigm of dividable learning that\u0000builds a model via \"divide-and-learn\". To handle sample sparsity, the samples\u0000from the configuration landscape are divided into distant divisions, for each\u0000of which we build a sparse local model, e.g., regularized Hierarchical\u0000Interaction Neural Network, to deal with the feature sparsity. A newly given\u0000configuration would then be assigned to the right model of division for the\u0000final prediction. Further, DaL adaptively determines the optimal number of\u0000divisions required for a system and sample size without any extra training or\u0000profiling. Experiment results from 12 real-world systems and five sets of\u0000training data reveal that, compared with the state-of-the-art approaches, DaL\u0000performs no worse than the best counterpart on 44 out of 60 cases with up to\u00001.61x improvement on accuracy; requires fewer samples to reach the same/better\u0000accuracy; and producing acceptable training overhead. In particular, the\u0000mechanism that adapted the parameter d can reach the optimal value for 76.43%\u0000of the individual runs. The result also confirms that the paradigm of dividable\u0000learning is more suitable than other similar paradigms such as ensemble\u0000learning for predicting configuration performance. Practically, DaL\u0000considerably improves different global models when using them as the underlying\u0000local models, which further strengthens its flexibility.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Pomponio, Maximiliano Cristiá, Estanislao Ruiz Sorazábal, Maximiliano García
{"title":"Reusability and Modifiability in Robotics Software (Extended Version)","authors":"Laura Pomponio, Maximiliano Cristiá, Estanislao Ruiz Sorazábal, Maximiliano García","doi":"arxiv-2409.07228","DOIUrl":"https://doi.org/arxiv-2409.07228","url":null,"abstract":"We show the design of the software of the microcontroller unit of a weeding\u0000robot based on the Process Control architectural style and design patterns. The\u0000design consists of 133 modules resulting from using 8 design patterns for a\u0000total of 30 problems. As a result the design yields more reusable components\u0000and an easily modifiable and extensible program. Design documentation is also\u0000presented. Finally, the implementation (12 KLOC of C++ code) is empirically\u0000evaluated to prove that the design does not produce an inefficient\u0000implementation.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot
{"title":"SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories","authors":"Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot","doi":"arxiv-2409.07440","DOIUrl":"https://doi.org/arxiv-2409.07440","url":null,"abstract":"Given that Large Language Models (LLMs) have made significant progress in\u0000writing code, can they now be used to autonomously reproduce results from\u0000research repositories? Such a capability would be a boon to the research\u0000community, helping researchers validate, understand, and extend prior work. To\u0000advance towards this goal, we introduce SUPER, the first benchmark designed to\u0000evaluate the capability of LLMs in setting up and executing tasks from research\u0000repositories. SUPERaims to capture the realistic challenges faced by\u0000researchers working with Machine Learning (ML) and Natural Language Processing\u0000(NLP) research repositories. Our benchmark comprises three distinct problem\u0000sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems\u0000derived from the expert set that focus on specific challenges (e.g.,\u0000configuring a trainer), and 602 automatically generated problems for\u0000larger-scale development. We introduce various evaluation measures to assess\u0000both task success and progress, utilizing gold solutions when available or\u0000approximations otherwise. We show that state-of-the-art approaches struggle to\u0000solve these problems with the best model (GPT-4o) solving only 16.3% of the\u0000end-to-end set, and 46.1% of the scenarios. This illustrates the challenge of\u0000this task, and suggests that SUPER can serve as a valuable resource for the\u0000community to make and measure progress.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study","authors":"Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma","doi":"arxiv-2409.07162","DOIUrl":"https://doi.org/arxiv-2409.07162","url":null,"abstract":"Analyzing user reviews for sentiment towards app features can provide\u0000valuable insights into users' perceptions of app functionality and their\u0000evolving needs. Given the volume of user reviews received daily, an automated\u0000mechanism to generate feature-level sentiment summaries of user reviews is\u0000needed. Recent advances in Large Language Models (LLMs) such as ChatGPT have\u0000shown impressive performance on several new tasks without updating the model's\u0000parameters i.e. using zero or a few labeled examples. Despite these\u0000advancements, LLMs' capabilities to perform feature-specific sentiment analysis\u0000of user reviews remain unexplored. This study compares the performance of\u0000state-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for\u0000extracting app features and associated sentiments under 0-shot, 1-shot, and\u00005-shot scenarios. Results indicate the best-performing GPT-4 model outperforms\u0000rule-based approaches by 23.6% in f1-score with zero-shot feature extraction;\u00005-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting\u0000positive sentiment towards correctly predicted app features, with 5-shot\u0000enhancing it by 7%. Our study suggests that LLM models are promising for\u0000generating feature-specific sentiment summaries of user reviews.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GitSEED: A Git-backed Automated Assessment Tool for Software Engineering and Programming Education","authors":"Pedro Orvalho, Mikoláš Janota, Vasco Manquinho","doi":"arxiv-2409.07362","DOIUrl":"https://doi.org/arxiv-2409.07362","url":null,"abstract":"Due to the substantial number of enrollments in programming courses, a key\u0000challenge is delivering personalized feedback to students. The nature of this\u0000feedback varies significantly, contingent on the subject and the chosen\u0000evaluation method. However, tailoring current Automated Assessment Tools (AATs)\u0000to integrate other program analysis tools is not straightforward. Moreover,\u0000AATs usually support only specific programming languages, providing feedback\u0000exclusively through dedicated websites based on test suites. This paper introduces GitSEED, a language-agnostic automated assessment tool\u0000designed for Programming Education and Software Engineering (SE) and backed by\u0000GitLab. The students interact with GitSEED through GitLab. Using GitSEED,\u0000students in Computer Science (CS) and SE can master the fundamentals of git\u0000while receiving personalized feedback on their programming assignments and\u0000projects. Furthermore, faculty members can easily tailor GitSEED's pipeline by\u0000integrating various code evaluation tools (e.g., memory leak detection, fault\u0000localization, program repair, etc.) to offer personalized feedback that aligns\u0000with the needs of each CS/SE course. Our experiments assess GitSEED's efficacy\u0000via comprehensive user evaluation, examining the impact of feedback mechanisms\u0000and features on student learning outcomes. Findings reveal positive\u0000correlations between GitSEED usage and student engagement.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Umm-e- Habiba, Markus Haug, Justus Bogner, Stefan Wagner
{"title":"How Mature is Requirements Engineering for AI-based Systems? A Systematic Mapping Study on Practices, Challenges, and Future Research Directions","authors":"Umm-e- Habiba, Markus Haug, Justus Bogner, Stefan Wagner","doi":"arxiv-2409.07192","DOIUrl":"https://doi.org/arxiv-2409.07192","url":null,"abstract":"Artificial intelligence (AI) permeates all fields of life, which resulted in\u0000new challenges in requirements engineering for artificial intelligence (RE4AI),\u0000e.g., the difficulty in specifying and validating requirements for AI or\u0000considering new quality requirements due to emerging ethical implications. It\u0000is currently unclear if existing RE methods are sufficient or if new ones are\u0000needed to address these challenges. Therefore, our goal is to provide a\u0000comprehensive overview of RE4AI to researchers and practitioners. What has been\u0000achieved so far, i.e., what practices are available, and what research gaps and\u0000challenges still need to be addressed? To achieve this, we conducted a\u0000systematic mapping study combining query string search and extensive\u0000snowballing. The extracted data was aggregated, and results were synthesized\u0000using thematic analysis. Our selection process led to the inclusion of 126\u0000primary studies. Existing RE4AI research focuses mainly on requirements\u0000analysis and elicitation, with most practices applied in these areas.\u0000Furthermore, we identified requirements specification, explainability, and the\u0000gap between machine learning engineers and end-users as the most prevalent\u0000challenges, along with a few others. Additionally, we proposed seven potential\u0000research directions to address these challenges. Practitioners can use our\u0000results to identify and select suitable RE methods for working on their\u0000AI-based systems, while researchers can build on the identified gaps and\u0000research directions to push the field forward.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"235 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oleksandr Kosenkov, Michael Unterkalmsteiner, Daniel Mendez, Jannik Fischbach
{"title":"Regulatory Requirements Engineering in Large Enterprises: An Interview Study on the European Accessibility Act","authors":"Oleksandr Kosenkov, Michael Unterkalmsteiner, Daniel Mendez, Jannik Fischbach","doi":"arxiv-2409.07313","DOIUrl":"https://doi.org/arxiv-2409.07313","url":null,"abstract":"Context: Regulations, such as the European Accessibility Act (EAA), impact\u0000the engineering of software products and services. Managing that impact while\u0000providing meaningful inputs to development teams is one of the emerging\u0000requirements engineering (RE) challenges. Problem: Enterprises conduct Regulatory Impact Analysis (RIA) to consider the\u0000effects of regulations on software products offered and formulate requirements\u0000at an enterprise level. Despite its practical relevance, we are unaware of any\u0000studies on this large-scale regulatory RE process. Methodology: We conducted an exploratory interview study of RIA in three\u0000large enterprises. We focused on how they conduct RIA, emphasizing\u0000cross-functional interactions, and using the EAA as an example. Results: RIA, as a regulatory RE process, is conducted to address the needs\u0000of executive management and central functions. It involves coordination between\u0000different functions and levels of enterprise hierarchy. Enterprises use\u0000artifacts to support interpretation and communication of the results of RIA.\u0000Challenges to RIA are mainly related to the execution of such coordination and\u0000managing the knowledge involved. Conclusion: RIA in large enterprises demands close coordination of multiple\u0000stakeholders and roles. Applying interpretation and compliance artifacts is one\u0000approach to support such coordination. However, there are no established\u0000practices for creating and managing such artifacts.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Choosing the Right Communication Protocol for your Web Application","authors":"Mohamed Hassan","doi":"arxiv-2409.07360","DOIUrl":"https://doi.org/arxiv-2409.07360","url":null,"abstract":"Selecting the appropriate communication protocol is crucial for optimizing\u0000the performance, scalability, and user experience of web applications. In the\u0000diverse ecosystem of web technologies, various protocols like RESTful APIs,\u0000gRPC, WebSockets, and others serve distinct purposes. RESTful APIs are widely\u0000favored for their simplicity and stateless nature, making them ideal for\u0000standard CRUD operations. They offer a straightforward approach to interacting\u0000with resources over HTTP/1.1, providing broad compatibility and ease of\u0000integration across different platforms. However, in scenarios where\u0000applications require high efficiency and real-time communication, gRPC and\u0000WebSockets emerge as powerful alternatives. Each protocol comes with its\u0000strengths and limitations, influencing factors such as ease of implementation,\u0000performance under load, and support for complex data structures. RESTful APIs,\u0000while easy to use and widely supported, may introduce overhead due to their\u0000stateless nature and reliance on multiple HTTP/1.1 requests. In contrast, gRPC\u0000advanced features, while powerful, require a steeper learning curve and more\u0000sophisticated infrastructure. Similarly, WebSockets, while excellent for\u0000real-time applications, require careful management of persistent connections\u0000and security considerations. This paper explores the key considerations in\u0000choosing the right communication protocol, emphasizing the need to align\u0000technical choices with application requirements and user expectations. By\u0000understanding the unique attributes of each protocol, developers can make\u0000informed decisions that enhance the responsiveness and reliability of their web\u0000applications. The choice of protocol can significantly impact the user\u0000experience, scalability, and maintainability of the application, making it a\u0000critical decision in the web development process.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}