Sagar A. Vankit, Vivian Genaro Motti, Tiffany D. Do, Samaneh Zamanifard, Deyrel Diaz, Andrew T. Duchowski, Bart P. Knijnenburg, Matias Volonte
{"title":"Path Modeling of Visual Attention, User Perceptions, and Behavior Change Intentions in Conversations With Embodied Agents in VR","authors":"Sagar A. Vankit, Vivian Genaro Motti, Tiffany D. Do, Samaneh Zamanifard, Deyrel Diaz, Andrew T. Duchowski, Bart P. Knijnenburg, Matias Volonte","doi":"10.1002/cav.70028","DOIUrl":"https://doi.org/10.1002/cav.70028","url":null,"abstract":"<p>This study examines how subtitles and image visualizations influence gaze behavior, working alliance, and behavior change intentions in virtual health conversations with ECAs. Visualizations refer to images on a 3D model TV and text on a virtual whiteboard, both reinforcing key content conveyed by the ECA. Using a 2 <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mo>×</mo>\u0000 </mrow>\u0000 <annotation>$$ times $$</annotation>\u0000 </semantics></math> 2 factorial design, participants were randomly assigned to one of four conditions: no subtitles or visualizations (Control), subtitles only (SUB), visualizations only (VIS), or both subtitles and visualizations (VISSUB). Structural equation path modeling showed that SUB and VIS individually reduced gaze toward the ECA, whereas VISSUB moderated this reduction, resulting in less gaze loss than the sum of either condition alone. Gaze behavior was positively associated with working alliance, and perceptions of enjoyment and appropriateness influenced engagement, which in turn predicted behavior change intentions. VIS was negatively associated with behavior change intentions, suggesting that excessive visual input may introduce cognitive trade-offs.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.70028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144331959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sagar A. Vankit, Samaneh Zamanifard, Deyrel Diaz, Christos Mousas, Kelly Richardson, Andrew T. Duchowski, Matias Volonte
{"title":"Exploring the Impact of Multimodal Long Conversations in VR on Attitudes Toward Behavior Change, Memory Retention, and Cognitive Load","authors":"Sagar A. Vankit, Samaneh Zamanifard, Deyrel Diaz, Christos Mousas, Kelly Richardson, Andrew T. Duchowski, Matias Volonte","doi":"10.1002/cav.70023","DOIUrl":"https://doi.org/10.1002/cav.70023","url":null,"abstract":"<p>This study examines how multimodal communication strategies (subtitles, visualizations, and their combination), affect memory retention, attitudes toward behavior change, and cognitive load during long conversations (+20 min) in immersive virtual reality (VR). Using embodied conversational agents to educate participants on diabetes and healthy eating, we found that all conditions effectively improved memory retention and behavior change attitudes. However, the combination of multimodal strategies increased cognitive load, suggesting a trade-off between engagement and cognitive demands. These findings highlight the potential of long VR conversations for healthcare education, while emphasizing the importance of balancing cognitive demands and exploring personalization for diverse users.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.70023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144323478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AU-Guided Feature Aggregation for Micro-Expression Recognition","authors":"Xiaohui Tan, Weiqi Xu, Jiazheng Wu, Hao Geng, Qichuan Geng","doi":"10.1002/cav.70041","DOIUrl":"https://doi.org/10.1002/cav.70041","url":null,"abstract":"<div>\u0000 \u0000 <p>Micro-expressions (MEs) are spontaneous and transient facial movements that reflect real internal emotions and have been widely applied in various fields. Recent deep learning-based methods have been rapidly developing in micro-expression recognition (MER).Still, it is typical to focus on the one-sided nature of MEs, covering only representational features or low-ranking Action Unit (AU) features. The subtle changes in MEs characterize its feature representation weak and inconspicuous, making it tough to analyze MEs only from a single piece or a small amount of information to achieve a considerable recognition effect. In addition, the lower-order information can only distinguish MEs from a single low-dimensional perspective and neglects the potential of corresponding MEs and AU combinations to each other. To address these issues, we first explore how the higher-order relations of different AU combinations correspond with MEs through statistical analysis. Afterward, based on this attribute, we propose an end-to-end multi-stream model that integrates global feature learning and local muscle movement representation guided by AU semantic information. The comparative experiments were performed on benchmark datasets, with better performance than the state-of-art methods. Also, the ablation experiments demonstrate the necessity of our model to introduce the information of AU and its relationship to MER.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144300302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on the Application of Immersive Virtual Reality Intervention Model (IVRT) in the Context of Computer Anxiety Among the Older Adults","authors":"Bo Zhang, HuiPing Shi, MengJiao Gu","doi":"10.1002/cav.70059","DOIUrl":"https://doi.org/10.1002/cav.70059","url":null,"abstract":"<div>\u0000 \u0000 <p>Elderly Computer Anxiety (ECA) often arises when older adults use digital tools, as it is a psychological condition influenced by both cognitive and functional impairments. Despite the widespread application of immersive virtual reality (IVR) technologies in enhancing cognitive abilities, their potential for addressing ECA remains underexplored. This study proposes an Immersive Virtual Reality Training (IVRT) model grounded in user experience(UX) design to comprehensively alleviate computer anxiety among older adults. The model integrates four interconnected modules: experience, training, socialization, and interaction. Each module is tailored to older adults, leveraging UX principles to enhance usability, foster engagement, and facilitate practical application through immersive VR. This paper conducted a randomized controlled trial involving 80 elderly individuals aged 65 to 85 to validate its effectiveness. Results showed significant reductions in anxiety levels within the IVRT group: Technophobia Scale (TS) scores decreased by an average of 0.49 points, and Gerontological Computer Anxiety Scale (GCAS) scores decreased by an average of 52.68 points, representing a 12.7% reduction (<i>p</i> < 0.05). These findings demonstrate that the IVRT model is significantly effective in reducing computer anxiety. It provides a solid foundation for developing immersive intervention measures tailored to older adults, better supporting their adaptation to the digital age.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Extended Reality in Emergency Response: Guidelines and Challenges for First Responder Friendly Augmented Interfaces","authors":"Fatih Oztank, Selim Balcisoy","doi":"10.1002/cav.70056","DOIUrl":"https://doi.org/10.1002/cav.70056","url":null,"abstract":"<div>\u0000 \u0000 <p>As Extended Reality (XR) technologies continue gaining popularity, various domains seek to integrate them into their workflows to enhance performance and user satisfaction. However, integrating XR technologies into emergency response presents unique challenges. Unlike other fields, such as healthcare, entertainment, or education, emergency response involves physically demanding environments and information-intensive tasks that first responders (FRs) must perform. Augmented reality (AR) head-mounted displays (HMDs) present promising solutions for improving situational awareness and reducing the cognitive load of the FRs. However, limited research has focused on the specific needs of FRs. Moreover, existing studies investigating FR needs have primarily been conducted in controlled laboratory settings, revealing a significant gap in the literature concerning FR requirements in real-life scenarios. This work addresses this gap through a comprehensive user study with subject matter experts (SMEs) and FRs. User studies were conducted after two different real-life scenarios using AR HMDs. To further understand FR needs, we extensively reviewed the literature for similar studies that reported FR needs, explicitly focusing on studies including interviews with SMEs and FRs. Our findings identified key design guidelines for FR-friendly AR interfaces while also highlighting the direction for future research to improve the user experience of the FRs.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144256281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Social Force Model-Driven Multi-Agent Generative Adversarial Imitation Learning Framework for Pedestrian Trajectory Prediction","authors":"Wen Zhou, Wangyu Shen, Xinyi Meng","doi":"10.1002/cav.70058","DOIUrl":"https://doi.org/10.1002/cav.70058","url":null,"abstract":"<div>\u0000 \u0000 <p>Recently, crowd trajectory prediction has attracted increasing attention. In particular, the simulation of pedestrian movement in scenarios such as crowd evacuation has gained increasing focus. The social force model is a promising and effective method for predicting the stochastic movement of pedestrians. However, individual heterogeneity, group-driven cooperation, and poor self-adaptive environmental interactive capabilities have not been comprehensively considered. This often makes it difficult to reproduce real scenarios. Therefore, a group-enabled social force model-driven multi-agent generative adversarial imitation learning framework, namely, SFMAGAIL, is proposed. Specifically, (1) a group-enabled individual heterogeneity schema is utilized to obtain related expert trajectories, which are fully incorporated into the desire force and group-enabled paradigms; (2) A joint policy is used to exploit the connection between the agents and the environment; and (3) To explore the intrinsic features of expert trajectories, an actor–critic-based multi-agent adversarial imitation learning framework is presented to generate effective trajectories. Finally, extensive experiments based on 2D and 3D virtual scenarios are conducted to validate our method. The results show that our proposed method is superior to the compared methods.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144244319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized Multiuser Panoramic Video Transmission in VR: A Machine Learning-Driven Approach","authors":"Wei Xun, Songlin Zhang","doi":"10.1002/cav.70060","DOIUrl":"https://doi.org/10.1002/cav.70060","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper, we propose a machine learning-driven model to optimize panoramic video transmission for multiple users in virtual reality environments. The model predicts users' future field of view (FOV) using historical head orientation data and video saliency information, enabling targeted video delivery based on individual perspectives. By segmenting panoramic videos into tiles and applying a pyramid coding scheme, we adaptively transmit high-quality content within users' FOVs while utilizing lower-quality transmissions for peripheral regions. This approach effectively reduces bandwidth consumption while maintaining a high-quality viewing experience. Our experimental results demonstrate that combining user viewpoint data with video saliency features significantly improves long-term FOV prediction accuracy, leading to a more efficient and user-centric transmission model. The proposed method holds great potential for enhancing the immersive experience of panoramic video streaming in VR, particularly in bandwidth-constrained environments.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiahui Pan, Frederick W. B. Li, Bailin Yang, Fangzhe Nan
{"title":"CLPFusion: A Latent Diffusion Model Framework for Realistic Chinese Landscape Painting Style Transfer","authors":"Jiahui Pan, Frederick W. B. Li, Bailin Yang, Fangzhe Nan","doi":"10.1002/cav.70053","DOIUrl":"https://doi.org/10.1002/cav.70053","url":null,"abstract":"<div>\u0000 \u0000 <p>This study focuses on transforming real-world scenery into Chinese landscape painting masterpieces through style transfer. Traditional methods using convolutional neural networks (CNNs) and generative adversarial networks (GANs) often yield inconsistent patterns and artifacts. The rise of diffusion models (DMs) presents new opportunities for realistic image generation, but their inherent noise characteristics make it challenging to synthesize pure white or black images. Consequently, existing DM-based methods struggle to capture the unique style and color information of Chinese landscape paintings. To overcome these limitations, we propose CLPFusion, a novel framework that leverages pre-trained diffusion models for artistic style transfer. A key innovation is the Bidirectional State Space Models-CrossAttention (BiSSM-CA) module, which efficiently learns and retains the distinct styles of Chinese landscape paintings. Additionally, we introduce two latent space feature adjustment methods, Latent-AdaIN and Latent-WCT, to enhance style modulation during inference. Experiments demonstrate that CLPFusion produces more realistic and artistic Chinese landscape paintings than existing approaches, showcasing its effectiveness and uniqueness in the field.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144232481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Folding by Skinning","authors":"Chunyang Ma, Lifeng Zhu","doi":"10.1002/cav.70055","DOIUrl":"https://doi.org/10.1002/cav.70055","url":null,"abstract":"<div>\u0000 \u0000 <p>We propose a novel method, entitled “Folding by Skinning”, which creatively integrates skinning techniques with folding simulations. This method allows users to specify a two-dimensional crease pattern along with the desired folding angles for each crease. Based on this input, the system computes the final three-dimensional shape of the fold. Rather than employing costly physics-based simulations, we explore the skinning method, noted for its effectiveness in handling the geometry of the folded shape. We recommend extracting the skinning weights directly from the user-defined crease patterns. By combining the obtained skinning weights with the user-input folding angles, the initial shape undergoes dual quaternion skinning to produce the folding result. Users can further optimize the shape using post-processing and targeted filtering of weights to generate more realistic results. Our experimental results demonstrate that “Folding by Skinning” yields high-quality outcomes and offers relatively fast computation, making it an effective tool for computer-aided design, animation, and fabrication applications.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144220085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BACH: Bi-Stage Data-Driven Piano Performance Animation for Controllable Hand Motion","authors":"Jihui Jiao, Rui Zeng, Ju Dai, Junjun Pan","doi":"10.1002/cav.70044","DOIUrl":"https://doi.org/10.1002/cav.70044","url":null,"abstract":"<div>\u0000 \u0000 <p>This paper presents a novel framework for generating piano performance animations using a two-stage deep learning model. By using discrete musical score data, the framework transforms sparse control signals into continuous, natural hand motions. Specifically, in the first stage, by incorporating musical temporal context, the keyframe predictor is leveraged to learn keyframe motion guidance. Meanwhile, the second stage synthesizes smooth transitions between these keyframes via an inter-frame sequence generator. Additionally, a Laplacian operator-based motion retargeting technique is introduced, ensuring that the generated animations can be adapted to different digital human models. We demonstrate the effectiveness of the system through an audiovisual multimedia application. Our approach provides an efficient, scalable method for generating realistic piano animations and holds promise for broader applications in animation tasks driven by sparse control signals.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}