{"title":"Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning","authors":"Shengcheng Yu, Chunrong Fang, Xin Li, Yuchen Ling, Zhenyu Chen, Zhendong Su","doi":"10.1145/3674728","DOIUrl":"https://doi.org/10.1145/3674728","url":null,"abstract":"<p>Software applications (apps) have been playing an increasingly important role in various aspects of society. In particular, mobile apps and web apps are the most prevalent among all applications and are widely used in various industries as well as in people’s daily lives. To help ensure mobile and web app quality, many approaches have been introduced to improve app GUI testing via automated exploration, including random testing, model-based testing, learning-based testing, <i>etc.</i> Despite the extensive effort, existing approaches are still limited in reaching high code coverage, constructing high-quality models, and being generally applicable. Reinforcement learning-based approaches, as a group of representative and advanced approaches for automated GUI exploration testing, are faced with difficult challenges, including effective app state abstraction, reward function design, <i>etc.</i> Moreover, they heavily depend on the specific execution platforms (<i>i.e.,</i> Android or Web), thus leading to poor generalizability and being unable to adapt to different platforms.</p><p>This work specifically tackles these challenges based on the high-level observation that apps from distinct platforms share commonalities in GUI design. Indeed, we propose PIRLT<sub>EST</sub>, an effective platform-independent approach for app testing. Specifically, PIRLT<sub>EST</sub> utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing. It extracts the GUI widgets from GUI pages and characterizes the corresponding GUI layouts, embedding the GUI pages as states. The app GUI state combines the macroscopic perspective (app GUI layout) and the microscopic perspective (app GUI widget), and attaches the critical semantic information from GUI images. This enables PIRLT<sub>EST</sub> to be platform-independent and makes the testing approach generally applicable on different platforms. PIRLT<sub>EST</sub> explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs to encourage more exploration in uncovered pages without platform dependency. The exploration will be assigned with rewards for all actions, which are designed considering both the app GUI states and the concrete widgets, to help the framework explore more uncovered pages. We conduct an empirical study on 20 mobile apps and 5 web apps, and the results show that PIRLT<sub>EST</sub> is zero-cost when being adapted to different platforms, and can perform better than the baselines, covering 6.3–41.4% more code on mobile apps and 1.5–51.1% more code on web apps. PIRLT<sub>EST</sub> is capable of detecting 128 unique bugs on mobile and web apps, including 100 bugs that cannot be detected by the baselines.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"51 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anni Peng, Dongliang Fang, Le Guan, Erik van der Kouwe, Yin Li, Wenwen Wang, Limin Sun, Yuqing Zhang
{"title":"Bitmap-Based Security Monitoring for Deeply Embedded Systems","authors":"Anni Peng, Dongliang Fang, Le Guan, Erik van der Kouwe, Yin Li, Wenwen Wang, Limin Sun, Yuqing Zhang","doi":"10.1145/3672460","DOIUrl":"https://doi.org/10.1145/3672460","url":null,"abstract":"<p>Deeply embedded systems powered by microcontrollers are becoming popular with the emergence of Internet of Things (IoT) technology. However, these devices primarily run C/C++ code and are susceptible to memory bugs, which can potentially lead to both control data attacks and non-control data attacks. Existing defense mechanisms (such as control flow integrity (CFI), data flow integrity (DFI) and write integrity testing (WIT), etc.) consume a massive amount of resources, making them less practical in real products. To make it lightweight, we design a bitmap-based allowlist mechanism to unify the storage of the runtime data for protecting both control data and non-control data. The memory requirements are constant and small, regardless of the number of deployed defense mechanisms. We store the allowlist in the TrustZone to ensure its integrity and confidentiality. Meanwhile, we perform an offline analysis to detect potential collisions and make corresponding adjustments when if happens. We have implemented our idea on an ARM Cortex-M based development board. Our evaluation results show a substantial reduction in memory consumption when deploying the proposed CFI and DFI mechanisms, without compromising runtime performance. Specifically, our prototype enforces CFI and DFI at a cost of just 2.09% performance overhead and 32.56% memory overhead on average.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"19 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elijah Zolduoarrati, Sherlock A. Licorish, Nigel Stanger
{"title":"Harmonising Contributions: Exploring Diversity in Software Engineering through CQA Mining on Stack Overflow","authors":"Elijah Zolduoarrati, Sherlock A. Licorish, Nigel Stanger","doi":"10.1145/3672453","DOIUrl":"https://doi.org/10.1145/3672453","url":null,"abstract":"<p>The need for collective intelligence in technology means that online Q&A platforms, such as Stack Overflow and Reddit, have become invaluable in building the global knowledge ecosystem. Despite literature demonstrating a prevalence of inclusion and contribution disparities in online communities, studies investigating the underlying reasons behind such fluctuations remain scarce. The current study examines Stack Overflow users’ contribution profiles, both in isolation and relative to various diversity metrics, including GDP and access to electricity. This study also examines whether such profiles propagate to the city and state levels, supplemented by granular data such as per capita income and education, before validating quantitative findings using content analysis. We selected 143 countries and compared the profiles of their respective users to assess implicit diversity-related complications that impact how users contribute. Results show that countries with high GDP, prominent R&D presence, less wealth inequality, and sufficient access to infrastructure tend to have more users, regardless of their development status. Similarly, cities and states where technology is more prevalent (e.g., San Francisco and New York) have more users who tend to contribute more often. Qualitative analysis reveals distinct communication styles based on users’ locations. Urban users exhibited assertive, solution-oriented behaviour, actively sharing information. Conversely, rural users engaged through inquiries and discussions, incorporating personal anecdotes, gratitude, and conciliatory language. Findings from this study may benefit scholars and practitioners, allowing them to develop sustainable mechanisms to bridge the inclusion and diversity gaps.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"25 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GIST: Generated Inputs Sets Transferability in Deep Learning","authors":"Florian Tambon, Foutse Khomh, Giuliano Antoniol","doi":"10.1145/3672457","DOIUrl":"https://doi.org/10.1145/3672457","url":null,"abstract":"<p>To foster the verifiability and testability of Deep Neural Networks (DNN), an increasing number of methods for test case generation techniques are being developed.</p><p>When confronted with testing DNN models, the user can apply any existing test generation technique. However, it needs to do so for each technique and each DNN model under test, which can be expensive. Therefore, a paradigm shift could benefit this testing process: rather than regenerating the test set independently for each DNN model under test, we could transfer from existing DNN models.</p><p>This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets. Given a property selected by a user (e.g., neurons covered, faults), GIST enables the selection of good test sets from the point of view of this property among available test sets. This allows the user to recover similar properties on the transferred test sets as he would have obtained by generating the test set from scratch with a test cases generation technique. Experimental results show that GIST can select effective test sets for the given property to transfer. Moreover, GIST scales better than reapplying test case generation techniques from scratch on DNN models under test.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"131 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Empirical Study on the Characteristics of Database Access Bugs in Java Applications","authors":"Wei Liu, Shouvick Mondal, Tse-Hsun (Peter) Chen","doi":"10.1145/3672449","DOIUrl":"https://doi.org/10.1145/3672449","url":null,"abstract":"<p>Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access bugs collected from seven large-scale Java open source applications that use relational database management systems (e.g., MySQL or PostgreSQL). We study the characteristics (e.g., occurrence and root causes) of the bugs by manually examining the bug reports and commit histories. We find that the number of reported database and non-database access bugs share a similar trend but their modified files in bug fixing commits are different. Additionally, we generalize categories of the root causes of database access bugs, containing five main categories (SQL queries, Schema, API, Configuration, SQL query result) and 25 unique root causes. We find that the bugs pertaining to SQL queries, Schema, and API cover 84.2% of database access bugs across all studied applications. In particular, SQL queries bug (54%) and API bug (38.7%) are the most frequent issues when using JDBC and Hibernate, respectively. Finally, we provide a discussion on the implications of our findings for developers and researchers.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"264 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-planning Code Generation with Large Language Models","authors":"Xue Jiang, Yihong Dong, Lecheng Wang, Fang Zheng, Qiwei Shang, Ge Li, Zhi Jin, Wenpin Jiao","doi":"10.1145/3672456","DOIUrl":"https://doi.org/10.1145/3672456","url":null,"abstract":"<p>Although large language models (LLMs) have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose complex problems and schedule solution steps prior to implementation. To this end, we introduce planning into code generation to help the model understand complex intent and reduce the difficulty of problem-solving. This paper proposes a self-planning code generation approach with large language models, which consists of two phases, namely planning phase and implementation phase. Specifically, in the planning phase, LLM plans out concise solution steps from the intent combined with few-shot prompting. Subsequently, in the implementation phase, the model generates code step by step, guided by the preceding solution steps. We conduct extensive experiments on various code-generation benchmarks across multiple programming languages. Experimental results show that self-planning code generation achieves a relative improvement of up to 25.4% in Pass@1 compared to direct code generation, and up to 11.9% compared to Chain-of-Thought of code generation. Moreover, our self-planning approach also enhances the quality of the generated code with respect to correctness, readability, and robustness, as assessed by humans.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"18 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Xiaofei Xie, Junjie Chen, Heming Cui
{"title":"Neuron Sensitivity Guided Test Case Selection","authors":"Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Xiaofei Xie, Junjie Chen, Heming Cui","doi":"10.1145/3672454","DOIUrl":"https://doi.org/10.1145/3672454","url":null,"abstract":"<p>Deep Neural Networks (DNNs) have been widely deployed in software to address various tasks (e.g., autonomous driving, medical diagnosis). However, they can also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal and repair incorrect behaviors in DNNs, developers often collect rich, unlabeled datasets from the natural world and label them to test DNN models. However, properly labeling a large number of datasets is a highly expensive and time-consuming task.</p><p>To address the above-mentioned problem, we propose NSS, Neuron Sensitivity Guided Test Case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the information of the internal neuron induced by the test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluated NSS with four widely used datasets and four well-designed DNN models compared to the state-of-the-art (SOTA) baseline methods. The results show that NSS performs well in assessing the probability of failure triggering in test cases and in the improvement capabilities of the model. Specifically, compared to the baseline approaches, NSS achieves a higher fault detection rate (e.g., when selecting 5% of the test cases from the unlabeled dataset in the MNIST&LeNet1 experiment, NSS can obtain an 81.8% fault detection rate, which is a 20% increase compared with SOTA baseline strategies).</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"34 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-collaboration Code Generation via ChatGPT","authors":"Yihong Dong, Xue Jiang, Zhi Jin, Ge Li","doi":"10.1145/3672459","DOIUrl":"https://doi.org/10.1145/3672459","url":null,"abstract":"<p>Although Large Language Models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex tasks through collaborative teamwork, a strategy that significantly controls development complexity and enhances software quality. Inspired by this, we present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. Specifically, through role instructions, 1) Multiple LLM agents act as distinct ‘experts’, each responsible for a specific subtask within a complex task; 2) Specify the way to collaborate and interact, so that different roles form a virtual team to facilitate each other’s work, ultimately the virtual team addresses code generation tasks collaboratively without the need for human intervention. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework. Thus, we assemble an elementary team consisting of three LLM roles (i.e., analyst, coder, and tester) responsible for software development’s analysis, coding, and testing stages. We conduct comprehensive experiments on various code-generation benchmarks. Experimental results indicate that self-collaboration code generation relatively improves 29.9%-47.1% Pass@1 compared to the base LLM agent. Moreover, we showcase that self-collaboration could potentially enable LLMs to efficiently handle complex repository-level tasks that are not readily solved by the single LLM agent.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"196 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Dejon, Chrystel Gaber, Gilles Grimaud, Narjes Jomaa
{"title":"Code to Qed, the Project Manager's Guide to Proof Engineering","authors":"Nicolas Dejon, Chrystel Gaber, Gilles Grimaud, Narjes Jomaa","doi":"10.1145/3664807","DOIUrl":"https://doi.org/10.1145/3664807","url":null,"abstract":"<p>Despite growing efforts and encouraging successes in the last decades, fully formally-verified projects are still rare in the industrial landscape. The industry often lacks the tools and methodologies to efficiently scale the proof development process. In this work, we give a comprehensible overview of the proof development process for proof developers and project managers. The goal is to support proof developers by rationalizing the proof development process, which currently relies heavily on their intuition and expertise, and by facilitating communication with the management line. To this end, we concentrate on the aspect of proof manufacturing and highlight the most significant sources of proof effort. We propose means to mitigate the latter through proof practices (proof structuring, proof strategies, and proof planning), proof metrics, and tools. Our approach is project-agnostic, independent of specific proof expertise, and computed estimations do not assume prior similar developments. We evaluate our guidelines using a separation kernel undergoing formal verification, driving the proof process in an optimised way. Feedback from a project manager unfamiliar with proof development confirms the benefits of detailed planning of the proof development steps, clear progress communication to the hierarchy line, and alignment with established practices in the software industry.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"25 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suwichak Fungprasertkul, Rami Bahsoon, Rick Kazman
{"title":"Technical Debt Monitoring Decision Making with Skin in the Game","authors":"Suwichak Fungprasertkul, Rami Bahsoon, Rick Kazman","doi":"10.1145/3664805","DOIUrl":"https://doi.org/10.1145/3664805","url":null,"abstract":"<p>Technical Debt Management (TDM) can suffer from unpredictability, communication gaps and the inaccessibility of relevant information, which hamper the effectiveness of its decision making. These issues can stem from division among decision-makers which takes root in unfair consequences of decisions among different decision-makers. One mitigation route is Skin in the Game thinking, which enforces transparency, fairness and shared responsibility during collective decision-making under uncertainty. This paper illustrates characteristics which require Skin in the Game thinking in Technical Debt (TD) identification, measurement, prioritisation and monitoring. We point out crucial problems in TD monitoring rooted in asymmetric information and asymmetric payoff between different factions of decision-makers. A systematic TD monitoring method is presented to mitigate the said problems. The method leverages Replicator Dynamics and Behavioural Learning. The method supports decision-makers with automated TD monitoring decisions; it informs decision-makers when human interventions are required. Two publicly available industrial projects with a non-trivial number of TD and timestamps are utilised to evaluate the application of our method. Mann-Whitney U hypothesis tests are conducted on samples of decisions from our method and the baseline. The statistical evidence indicates that our method can produce cost-effective and contextual TD monitoring decisions.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}