Harnessing large language models for virtual reality exploration testing: a case study

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-09-18 DOI:10.1007/s10515-025-00535-3

Zhenyu Qi, Haotang Li, Hao Qin, Kebin Peng, Sen He, Xue Qin

{"title":"Harnessing large language models for virtual reality exploration testing: a case study","authors":"Zhenyu Qi, Haotang Li, Hao Qin, Kebin Peng, Sen He, Xue Qin","doi":"10.1007/s10515-025-00535-3","DOIUrl":null,"url":null,"abstract":"<div>As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing rapidly. Large Language Models (LLMs), capable of retaining information long-term and analyzing both visual and textual data, are emerging as a potential key to deciphering the complexities of VR’s evolving user interfaces. In this paper, we conduct a case study to investigate the capability of using LLMs, particularly GPT-4o, for field of view (FOV) analysis in VR exploration testing. Specifically, we validate that LLMs can identify test entities in FOVs and that prompt engineering can effectively enhance the accuracy of test entity identification from \\(\\varvec{41.67\\%}\\) to \\(\\varvec{71.30\\%}\\). Our study also shows that LLMs can accurately describe identified entities’ features with at least a \\(\\varvec{90\\%}\\) accuracy rate. We further find out that the core features that effectively represent an entity are color, placement, and shape. Furthermore, the combination of the three features can especially be used to improve the accuracy of determining identical entities in multiple FOVs with the highest F1-score of \\(\\varvec{0.70}\\). Additionally, our study demonstrates that LLMs are capable of scene recognition and spatial understanding in VR with precisely designed structured prompts. Finally, we find that LLMs fail to label the identified test entities, and we discuss potential solutions as future research directions.</div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00535-3.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00535-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

As the Virtual Reality (VR) industry expands, the need for automated GUI testing is growing rapidly. Large Language Models (LLMs), capable of retaining information long-term and analyzing both visual and textual data, are emerging as a potential key to deciphering the complexities of VR’s evolving user interfaces. In this paper, we conduct a case study to investigate the capability of using LLMs, particularly GPT-4o, for field of view (FOV) analysis in VR exploration testing. Specifically, we validate that LLMs can identify test entities in FOVs and that prompt engineering can effectively enhance the accuracy of test entity identification from \(\varvec{41.67\%}\) to \(\varvec{71.30\%}\). Our study also shows that LLMs can accurately describe identified entities’ features with at least a \(\varvec{90\%}\) accuracy rate. We further find out that the core features that effectively represent an entity are color, placement, and shape. Furthermore, the combination of the three features can especially be used to improve the accuracy of determining identical entities in multiple FOVs with the highest F1-score of \(\varvec{0.70}\). Additionally, our study demonstrates that LLMs are capable of scene recognition and spatial understanding in VR with precisely designed structured prompts. Finally, we find that LLMs fail to label the identified test entities, and we discuss potential solutions as future research directions.

查看原文本刊更多论文

利用大型语言模型进行虚拟现实探索测试：一个案例研究

随着虚拟现实（VR）行业的发展，对自动化GUI测试的需求正在迅速增长。大型语言模型（llm）能够长期保存信息并分析视觉和文本数据，正在成为破解VR不断发展的用户界面复杂性的潜在关键。在本文中，我们进行了一个案例研究，以调查使用llm，特别是gpt - 40，在VR勘探测试中的视场（FOV）分析能力。具体来说，我们验证了llm可以识别fov中的测试实体，并且提示工程可以有效地提高从\(\varvec{41.67\%}\)到\(\varvec{71.30\%}\)的测试实体识别的准确性。我们的研究还表明，llm可以准确地描述识别实体的特征，准确率至少达到\(\varvec{90\%}\)。我们进一步发现，有效表示实体的核心特征是颜色、位置和形状。此外，这三个特征的结合尤其可以提高在f1得分最高的\(\varvec{0.70}\)的多个fov中确定相同实体的准确性。此外，我们的研究表明，llm能够通过精确设计的结构化提示在VR中进行场景识别和空间理解。最后，我们发现llm未能对识别的测试实体进行标记，并讨论了潜在的解决方案作为未来的研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.