Automated Software Engineering最新文献

筛选
英文 中文
Decomposition then watermarking: Enhancing code traceability with dual-channel code watermarking 分解后加水印:利用双通道码水印增强码的可追溯性
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-10-10 DOI: 10.1007/s10515-025-00561-1
Haibo Lin, Zhong Li, Ruihua Ji, Minxue Pan, Tian Zhang, Nan Wu, Xuandong Li
{"title":"Decomposition then watermarking: Enhancing code traceability with dual-channel code watermarking","authors":"Haibo Lin,&nbsp;Zhong Li,&nbsp;Ruihua Ji,&nbsp;Minxue Pan,&nbsp;Tian Zhang,&nbsp;Nan Wu,&nbsp;Xuandong Li","doi":"10.1007/s10515-025-00561-1","DOIUrl":"10.1007/s10515-025-00561-1","url":null,"abstract":"<div><p>Code watermarking has gained increasing attention for tracing the provenance of code with the rapid growth of the open-source community. Existing work on code watermarking has shown promising results yet still falls short, especially when a multi-bit watermark for encoding diverse information is required. In this paper, we propose <span>DWC</span>, a novel code watermarking method with highly watermark capacity. The key idea of <span>DWC</span> is to first decompose the code into natural and formal channels, then embed the watermark separately into each channel based solely on its respective information. As such, <span>DWC</span> reduces the mutual interference between these two channels and the impacts of irrelevant information within the code, thus enabling more effective transformations for embedding watermarks with higher capacity and robustness. Our extensive experiments on source code snippets in four programming languages (C, C++, Java, and Python) demonstrate the effectiveness, efficiency, and capability of <span>DWC</span> in embedding multi-bit watermarks, as well as the utility and robustness of the watermarked code it generates.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145256631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sign language to SQL query translation system for enhancing database accessibility 一个用于增强数据库可访问性的手语到SQL查询的翻译系统
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-10-07 DOI: 10.1007/s10515-025-00558-w
Guocang Yang, Dawei Yuan, Tao Zhang, Zhenghan Chen
{"title":"A sign language to SQL query translation system for enhancing database accessibility","authors":"Guocang Yang,&nbsp;Dawei Yuan,&nbsp;Tao Zhang,&nbsp;Zhenghan Chen","doi":"10.1007/s10515-025-00558-w","DOIUrl":"10.1007/s10515-025-00558-w","url":null,"abstract":"<div><p>Structured Query Language (SQL) is a standard language for interacting with relational databases and is widely used across various information systems, either through direct query execution or via object-relational mapping (ORM) frameworks. Recent approaches have focused on converting natural language into SQL to simplify database development for users without programming expertise. However, these methods overlook direct translation from sign language—an essential modality for users such as the deaf community who may lack experience with SQL syntax. In this paper, we present <i>SIGN2SQL</i>, an innovative end-to-end framework that generates SQL queries from signed input. The system first employs a dedicated gesture recognition module to interpret the visual signals, followed by a convolutional neural network (CNN)-based model that produces the corresponding SQL statements. Trained on a well-annotated dataset, SIGN2SQL is evaluated against multiple pipeline-based baselines. Experimental results demonstrate that SIGN2SQL outperforms existing methods in both effectiveness and efficiency, particularly for SELECT statements with WHERE clauses. It achieves an execution accuracy of 89.8%, highlighting its potential as an accessible and inclusive database interaction interface.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00558-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145256302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward efficient testing of graph neural networks via test input prioritization 基于测试输入优先级的图神经网络高效测试研究
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-10-07 DOI: 10.1007/s10515-025-00554-0
Lichen Yang, Qiang Wang, Zhonghao Yang, Daojing He, Yu Li
{"title":"Toward efficient testing of graph neural networks via test input prioritization","authors":"Lichen Yang,&nbsp;Qiang Wang,&nbsp;Zhonghao Yang,&nbsp;Daojing He,&nbsp;Yu Li","doi":"10.1007/s10515-025-00554-0","DOIUrl":"10.1007/s10515-025-00554-0","url":null,"abstract":"<div><p>Graph Neural Networks (GNNs) have demonstrated remarkable efficacy in handling graph-structured data; however, they exhibit failures after deployment, which can cause severe consequences. Hence, conducting thorough testing before deployment becomes imperative to ensure the reliability of GNNs. However, thorough testing requires numerous manually annotated test data. To mitigate the annotation cost, strategically prioritizing and labeling high-quality unlabeled inputs for testing becomes crucial, which facilitates uncovering more model failures with a limited labeling budget. Unfortunately, existing test input prioritization techniques either overlook the valuable information contained in graph structures or are overly reliant on attributes extracted from the target model, <i>i.e., model-aware attributes</i>, whose quality can vary significantly. To address these issues, we propose a novel test input prioritization framework, named <i>GraphRank</i>, for GNNs. GraphRank introduces model-agnostic attributes to compensate for the limitations of the model-aware ones. It also leverages the graph structure information to aggregate attributes from neighboring nodes, thereby enhancing the model-aware and model-agnostic attributes. Furthermore, GraphRank combines the above attributes with a binary classifier, using it as a ranking model to prioritize inputs. This classifier undergoes iterative training, which enables it to learn from each round’s feedback and improve its performance accordingly. Extensive experiments demonstrate GraphRank’s superiority over existing techniques.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145256303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph based transfer learning with orthogonal tunning for functionality size insights 基于图的迁移学习与正交调谐的功能大小见解
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-10-06 DOI: 10.1007/s10515-025-00562-0
Nevena Ranković, Dragica Ranković, Gonzalo Nápoles, Federico Zamberlan
{"title":"Graph based transfer learning with orthogonal tunning for functionality size insights","authors":"Nevena Ranković,&nbsp;Dragica Ranković,&nbsp;Gonzalo Nápoles,&nbsp;Federico Zamberlan","doi":"10.1007/s10515-025-00562-0","DOIUrl":"10.1007/s10515-025-00562-0","url":null,"abstract":"<div><p>Function Point Analysis (FPA) is a method in software engineering that focuses on identifying the functions provided by a software system to users, such as data input, processing, output, and database management. These functions are classified according to complexity to quantify the system’s size in functional point units. In this paper, we propose two graph neural networks: a Graph-based Similarity Detection Neural Network (GSDNN) and a Prior-Structural Information Graph Neural Network (PSI-GNN) with a pre-trained layer using transfer learning, to define the best model for functional size prediction and uncover patterns and trends in data. Additionally, the NESMA (Netherlands Software Metrics Users Association) method, from the functional families approach, will be in focus, where the ISBSG (International Software Benchmarking Standards Group) dataset, which provides standardized and relevant data for comparing software performance, was used to analyze 1704 industrial software projects. The goal was to identify the graph architecture with the smallest number of experiments to be performed and the lowest Mean Magnitude Relative Error (MMRE) using orthogonal-array tuning optimization <i>via Latin Square</i> extraction. In the proposed approach, the number of experiments is fewer than 8 for each dataset, and a minimum MMRE value of 0.97% was obtained using PSI-GNN. Additionally, the impact of five input features on the change in MMRE value was analyzed with the top-performing model, employing the SHAP (SHapley Additive exPlanations) feature importance method, visualized through GraphExplainer. The frequency of user-initiated transactions, quantified technically, emerged as the most significant determinant within the NESMA framework.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00562-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145256587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving anomaly detection in software logs through hybrid language modeling and reduced reliance on parser 通过混合语言建模改进软件日志异常检测,减少对解析器的依赖
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-09-29 DOI: 10.1007/s10515-025-00548-y
Yicheng Sun, Jacky Keung, Zhen Yang, Shuo Liu, Hi Kuen Yu
{"title":"Improving anomaly detection in software logs through hybrid language modeling and reduced reliance on parser","authors":"Yicheng Sun,&nbsp;Jacky Keung,&nbsp;Zhen Yang,&nbsp;Shuo Liu,&nbsp;Hi Kuen Yu","doi":"10.1007/s10515-025-00548-y","DOIUrl":"10.1007/s10515-025-00548-y","url":null,"abstract":"<div><p>Anomaly detection in software logs is crucial for development and maintenance, allowing timely identification of system failures and ensuring normal operations. Although recent deep learning advancements in log anomaly detection have shown exceptional performance, the reliance on time-consuming log parsers raises concerns about their necessity for quickly identifying anomalies. Standardized preprocessing methods can mishandle or lose important information. Additionally, the significant imbalance between normal and anomalous log data, along with the scarcity of labeled data, presents a persistent challenge in anomaly detection. We first evaluated the impact of omitting a log parser on anomaly detection models. Subsequently, we propose LogRoBERTa, an innovative anomaly detection model that eliminates the need for a parser. LogRoBERTa creates a stable and diverse labeled training set using the Determinantal Point Process (DPP) method, needing only a small amount of labeled data. The hybrid language model is based on RoBERTa’s architecture, combined with an attention-based BiLSTM. This setup leverages RoBERTa’s strong contextual understanding and BiLSTM’s capability to capture sequential dependencies, enhancing performance in complex log sequences. Experiments on four widely used datasets demonstrate that LogRoBERTa outperforms state-of-the-art benchmark models—including three fully supervised approaches—without relying on a dedicated log parser. Furthermore, its consistently strong performance on low-resource datasets highlights its robustness and generalizability across varying data conditions. These results validate the overall effectiveness of LogRoBERTa’s design and offer a thorough evaluation of the implications of bypassing a log parser. Additionally, our ablation studies and training set construction experiments further confirm the contributions of each individual component to the model’s performance. The study empirically validated that a RoBERTa-based approach effectively handles software log anomaly detection in long and complex log sequences, providing a more efficient and robust solution for omitting a parser compared to existing models.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BRMDS: an LLM-based multi-dimensional summary generation approach for bug reports BRMDS:基于llm的多维总结生成方法,用于生成bug报告
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-09-23 DOI: 10.1007/s10515-025-00553-1
Yayun Zhang, Yuying Li, Minying Fang, Xing Yuan, Junwei Du
{"title":"BRMDS: an LLM-based multi-dimensional summary generation approach for bug reports","authors":"Yayun Zhang,&nbsp;Yuying Li,&nbsp;Minying Fang,&nbsp;Xing Yuan,&nbsp;Junwei Du","doi":"10.1007/s10515-025-00553-1","DOIUrl":"10.1007/s10515-025-00553-1","url":null,"abstract":"<div><p>Bug report summarization aims to generate concise and accurate descriptions to help developers understand and maintain. The existing methodologies prioritize simplifying reporting content but fail to provide a structured and well-rounded description of bugs, limiting developers’ understanding efficiency. In this paper, we leverage large language models (LLMs) to generate detailed, multi-dimensional summaries. Our intuition is based on the following facts: (1) LLMs establish robust semantic connections through extensive pre-training on paired data; (2) Real-world bug reports contain multi-dimensional information. We propose the Bug Report Multi-Dimensional Summary (BRMDS) approach, defining five dimensions: environment, actual behavior, expected behavior, bug category, and solution suggestions, and use specific instructions for each dimension to guide LLM in Parameter Efficient Fine-Tuning (PEFT). We construct a dataset in multi-dimensional information for PEFT and experimental evaluation, thereby addressing the gaps in existing datasets within this domain. The experimental results show that multi-dimensional summaries enhance developers’ understanding of bug reports. BRMDS approach outperforms baseline approaches in both automatic and human evaluations. Our datasets are publicly available at https://github.com/yunjua/bug-reports-multi-dimensional.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PIONEER: improving the robustness of student models when compressing pre-trained models of code PIONEER:在压缩预训练的代码模型时,提高学生模型的鲁棒性
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-09-23 DOI: 10.1007/s10515-025-00560-2
Xiangyue Liu, Xinwei Liu, Lili Bo, Xiaoxue Wu, Yun Yang, Xiaobing Sun, Feng Zhou
{"title":"PIONEER: improving the robustness of student models when compressing pre-trained models of code","authors":"Xiangyue Liu,&nbsp;Xinwei Liu,&nbsp;Lili Bo,&nbsp;Xiaoxue Wu,&nbsp;Yun Yang,&nbsp;Xiaobing Sun,&nbsp;Feng Zhou","doi":"10.1007/s10515-025-00560-2","DOIUrl":"10.1007/s10515-025-00560-2","url":null,"abstract":"<div><p>Pre-trained models of code have shown significant effectiveness in a variety of software engineering tasks, but they are difficult for local deployment due to their large size. Existing works mainly focus on compressing these large models into small models to achieve similar performance and efficient inference. However, it is ignored that the small models should be robust enough to deal with adversarial examples that make incorrect predictions to users. Knowledge distillation techniques typically transform the model compression problem into a combinatorial optimization problem of the student architecture space to achieve the best student model performance. But they can only improve the robustness of the student model to a limited extent through traditional adversarial training. This paper proposes PIONEER (Im<b>P</b>rov<b>I</b>ng the R<b>O</b>bustness of Stude<b>N</b>t Mod<b>E</b>ls Wh<b>E</b>n Comp<b>R</b>essing Code Models), a novel knowledge distillation technique that enhances the robustness of the student model without requiring adversarial training. PIONEER incorporates robustness evaluation during distillation to guide the optimization of the student model architecture. By using the probability distributions of original examples and adversarial examples as soft labels, the student model learns the features of both the original samples and adversarial examples during training. We conduct experimental evaluations on two downstream tasks (vulnerability prediction and clone detection) for the three models (CodeBERT, GraphCodeBERT, and CodeT5). We utilize PIONEER to compress six downstream task models to small (3 MB) models that are 206<span>(times)</span> smaller than the original size. The results show that compressed models reduce the inference latency (76<span>(times)</span>) and improve the robustness of the model (87.54%) with negligible loss of effectiveness (1.67%).</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the bugs in reinforcement learning programs: Insights from Stack Overflow and GitHub 调查强化学习程序中的bug:来自Stack Overflow和GitHub的见解
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-09-23 DOI: 10.1007/s10515-025-00555-z
Jiayin Song, Yike Li, Yunzhe Tian, Haoxuan Ma, Honglei Li, Jie Zuo, Jiqiang Liu, Wenjia Niu
{"title":"Investigating the bugs in reinforcement learning programs: Insights from Stack Overflow and GitHub","authors":"Jiayin Song,&nbsp;Yike Li,&nbsp;Yunzhe Tian,&nbsp;Haoxuan Ma,&nbsp;Honglei Li,&nbsp;Jie Zuo,&nbsp;Jiqiang Liu,&nbsp;Wenjia Niu","doi":"10.1007/s10515-025-00555-z","DOIUrl":"10.1007/s10515-025-00555-z","url":null,"abstract":"<div><p>Reinforcement learning (RL) is increasingly applied in areas such as gaming, robotic control, and autonomous driving. Like to deep learning, RL systems also encounter failures during operation. However, RL differs from deep learning in terms of its error causes and symptom manifestations. What are the differences in error causes and symptoms between RL and deep learning? How are RL errors and their symptoms related? Understanding the symptoms and causes of RL failures can advance research on RL failure detection and repair. In this paper, we conducted a comprehensive empirical study by collecting 1,155 error reports from the popular Q&amp;A forum <i>Stack Overflow</i> and four <i>GitHub</i> repositories: baselines, stable-baselines3, tianshou and keras-rl. We analyzed the root causes and symptoms of these failures and examined the differences in resolution times across various root causes. Additionally, we analyzed the correlations between causes and symptoms. Our study yielded 14 key findings, and six implications for developing RL detection and failure repair tools. Our work is the first to integrate LLM-based analysis with manual validation for RL bug studies, providing actionable insights for tool development and testing strategies.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causes and effects of fitness landscapes in system test generation: a replication study 系统测试生成中适应度景观的因果关系:一项复制研究
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-09-18 DOI: 10.1007/s10515-025-00539-z
Omur Sahin, Man Zhang, Andrea Arcuri
{"title":"Causes and effects of fitness landscapes in system test generation: a replication study","authors":"Omur Sahin,&nbsp;Man Zhang,&nbsp;Andrea Arcuri","doi":"10.1007/s10515-025-00539-z","DOIUrl":"10.1007/s10515-025-00539-z","url":null,"abstract":"<div><p>Search-Based Software Testing (SBST) has seen several success stories in academia and industry. The effectiveness of a search algorithm at solving a software engineering problem strongly depends on how such algorithm can navigate the <i>fitness landscape</i> of the addressed problem. The fitness landscape depends on the used fitness function. Understanding the properties of a fitness landscape can help to provide insight on how a search algorithm behaves on it. Such insight can provide valuable information to researchers to being able to design novel, more effective search algorithms and fitness functions tailored for a specific problem. Due to its importance, few fitness landscape analyses have been carried out in the scientific literature of SBST. However, those have been focusing on the problem of <i>unit test</i> generation, e.g., with state-of-the-art tools such as EvoSuite. In this paper, we <i>replicate</i> one such existing study. However, in our work we focus on <i>system test</i> generation, with the state-of-the-art tool <span>EvoMaster</span>. Based on an empirical study involving the testing of 23 web services, this enables us to provide valuable insight into this important testing domain of practical industrial relevance. Our results indicate that fitness landscapes are largely dominated by neutral regions (e.g., plateaus), which make the search process challenging. We observe that the presence of information content in the landscape can improve search guidance, while boolean flags are a primary contributor to neutrality. These findings confirm prior results in unit testing but also reveal system-level differences, particularly in how branch types impact search effectiveness. These insights suggest the need for improved fitness functions, testability transformations, and search operators tailored to system-level testing.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00539-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing large language models for virtual reality exploration testing: a case study 利用大型语言模型进行虚拟现实探索测试:一个案例研究
IF 3.1 2区 计算机科学
Automated Software Engineering Pub Date : 2025-09-18 DOI: 10.1007/s10515-025-00535-3
Zhenyu Qi, Haotang Li, Hao Qin, Kebin Peng, Sen He, Xue Qin
{"title":"Harnessing large language models for virtual reality exploration testing: a case study","authors":"Zhenyu Qi,&nbsp;Haotang Li,&nbsp;Hao Qin,&nbsp;Kebin Peng,&nbsp;Sen He,&nbsp;Xue Qin","doi":"10.1007/s10515-025-00535-3","DOIUrl":"10.1007/s10515-025-00535-3","url":null,"abstract":"<div><p>As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing rapidly. Large Language Models (LLMs), capable of retaining information long-term and analyzing both visual and textual data, are emerging as a potential key to deciphering the complexities of VR’s evolving user interfaces. In this paper, we conduct a case study to investigate the capability of using LLMs, particularly GPT-4o, for field of view (FOV) analysis in VR exploration testing. Specifically, we validate that LLMs can identify test entities in FOVs and that prompt engineering can effectively enhance the accuracy of test entity identification from <span>(varvec{41.67%})</span> to <span>(varvec{71.30%})</span>. Our study also shows that LLMs can accurately describe identified entities’ features with at least a <span>(varvec{90%})</span> accuracy rate. We further find out that the core features that effectively represent an entity are color, placement, and shape. Furthermore, the combination of the three features can especially be used to improve the accuracy of determining identical entities in multiple FOVs with the highest F1-score of <span>(varvec{0.70})</span>. Additionally, our study demonstrates that LLMs are capable of scene recognition and spatial understanding in VR with precisely designed structured prompts. Finally, we find that LLMs fail to label the identified test entities, and we discuss potential solutions as future research directions.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00535-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信