VOICE: Visual Oracle for Interaction, Conversation, and Explanation.

IEEE transactions on visualization and computer graphics Pub Date : 2025-06-16 DOI:10.1109/TVCG.2025.3579956

Donggang Jia, Alexandra Irger, Lonni Besancon, Ondrej Strnad, Deng Luo, Johanna Bjorklund, Alexandre Kouyoumdjian, Anders Ynnerman, Ivan Viola

{"title":"VOICE: Visual Oracle for Interaction, Conversation, and Explanation.","authors":"Donggang Jia, Alexandra Irger, Lonni Besancon, Ondrej Strnad, Deng Luo, Johanna Bjorklund, Alexandre Kouyoumdjian, Anders Ynnerman, Ivan Viola","doi":"10.1109/TVCG.2025.3579956","DOIUrl":null,"url":null,"abstract":"<p><p>We present VOICE, a novel approach to science communication that connects large language models' conversational capabilities with interactive exploratory visualization. VOICE introduces several innovative technical contributions that drive our conversational visualization framework. Based on the collected design requirements, we introduce a two-layer agent architecture that can perform task assignment, instruction extraction, and coherent content generation. We employ fine-tuning and prompt engineering techniques to tailor agents' performance to their specific roles and accurately respond to user queries. Our interactive text-to-visualization method generates a flythrough sequence matching the content explanation. In addition, natural language interaction provides capabilities to navigate and manipulate 3D models in real-time. The VOICE framework can receive arbitrary voice commands from the user and respond verbally, tightly coupled with a corresponding visual representation, with low latency and high accuracy. We demonstrate the effectiveness of our approach by implementing a proof-of-concept prototype and applying it to the molecular visualization domain: analyzing three 3D molecular models with multiscale and multi-instance attributes. Finally, we conduct a comprehensive evaluation of the system, including quantitative and qualitative analyses on our collected dataset, along with a detailed public user study and expert interviews. The results confirm that our framework and prototype effectively meet the design requirements and cater to the needs of diverse target users. All supplemental materials are available at https://osf.io/g7fbr.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3579956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We present VOICE, a novel approach to science communication that connects large language models' conversational capabilities with interactive exploratory visualization. VOICE introduces several innovative technical contributions that drive our conversational visualization framework. Based on the collected design requirements, we introduce a two-layer agent architecture that can perform task assignment, instruction extraction, and coherent content generation. We employ fine-tuning and prompt engineering techniques to tailor agents' performance to their specific roles and accurately respond to user queries. Our interactive text-to-visualization method generates a flythrough sequence matching the content explanation. In addition, natural language interaction provides capabilities to navigate and manipulate 3D models in real-time. The VOICE framework can receive arbitrary voice commands from the user and respond verbally, tightly coupled with a corresponding visual representation, with low latency and high accuracy. We demonstrate the effectiveness of our approach by implementing a proof-of-concept prototype and applying it to the molecular visualization domain: analyzing three 3D molecular models with multiscale and multi-instance attributes. Finally, we conduct a comprehensive evaluation of the system, including quantitative and qualitative analyses on our collected dataset, along with a detailed public user study and expert interviews. The results confirm that our framework and prototype effectively meet the design requirements and cater to the needs of diverse target users. All supplemental materials are available at https://osf.io/g7fbr.

查看原文本刊更多论文

用于交互、对话和解释的可视化Oracle。

我们提出VOICE，一种科学传播的新方法，将大型语言模型的会话能力与交互式探索性可视化联系起来。VOICE引入了一些创新的技术贡献，推动了我们的会话可视化框架。基于收集到的设计需求，我们引入了一个两层代理体系结构，该体系结构可以执行任务分配、指令提取和连贯内容生成。我们采用微调和提示工程技术来定制代理的性能以适应其特定的角色，并准确地响应用户的查询。我们的交互式文本到可视化方法生成与内容解释匹配的飞越序列。此外，自然语言交互提供了实时导航和操作3D模型的能力。VOICE框架可以接收来自用户的任意语音命令并进行口头响应，与相应的视觉表示紧密耦合，具有低延迟和高精度。我们通过实现概念验证原型并将其应用于分子可视化领域来证明我们方法的有效性：分析三个具有多尺度和多实例属性的3D分子模型。最后，我们对系统进行了全面的评估，包括对我们收集的数据集进行定量和定性分析，以及详细的公共用户研究和专家访谈。结果证实了我们的框架和原型有效地满足了设计要求，迎合了不同目标用户的需求。所有补充材料可在https://osf.io/g7fbr上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量