{"title":"Development of a DT-Driven Virtual Reality for Human-Robot Collaborative Safety Education Using Formwork Design","authors":"Jeremy S. Liang, Cindy Lin","doi":"10.1002/cav.70098","DOIUrl":"https://doi.org/10.1002/cav.70098","url":null,"abstract":"<div>\u0000 \u0000 <p>The vision of Industry 5.0, which addresses humanistic, adaptable and sustainable method, has evolved rapidly, particularly in digital twin-driven (DT) utilization in industry, for example, BMW factory and Tata steel. Nevertheless, it is no secret that the development of DT-powered virtual reality (VR) environments is hard work, which denotes there is untapped potential. Thus, a reusable DT model is required to speed up the creation and scaling of DT-driven VR environments. The objective of this study is for seeking solutions to make DT-driven VR contexts faster in an industrial environment. In particular, this study presents a way to develop the settings more effectively. The highlight of this study, and related industrial instances, is the development of safety education for human-robot collaboration in manufacturing contexts utilizing DT-driven VR settings. A formwork mode is introduced, which involves specific formworks and the entire framework for disposing the formworks in order to more quickly amend DT-driven VR environments to satisfy the requirements of a particular instance. The formwork is established applying two different industrial instances from manufacturing scenario.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146140161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pedestrian Scene Coverage Control Using Perceptive Quality-Based Virtual Potential Field","authors":"Liangliang Cai, Zhuocheng Liu, Zhong Zhou","doi":"10.1002/cav.70095","DOIUrl":"https://doi.org/10.1002/cav.70095","url":null,"abstract":"<div>\u0000 \u0000 <p>Comprehensive observation of target area with pedestrians through pan-tilt-zoom (PTZ) camera networks is crucial in various surveillance applications. However, the dynamic configuration of PTZ cameras increases the difficulty of coordinating multiple cameras to monitor large-scale scenes. Since coverage control in PTZ camera networks has been proven to be an NP-hard problem, many studies have adopted virtual potential field (VPF) algorithms to efficiently obtain approximate solutions. The VPF methods treat camera viewpoints as charged particles. Through repulsive forces between these particles, PTZ camera networks expand scene coverage and reduce overlap between camera fields of view (FoVs). However, VPF-based methods cannot leverage the scene layout and target priority information, failing to cover pedestrians and other critical areas. In this work, we introduce a unified perception quality measure framework that quantifies surveillance importance for scenes, cameras, and pedestrians. Building on this framework, we design a coverage control scheme using a perceptive quality-based virtual potential field. This scheme models target regions and pedestrian priorities as virtual gravitational and attractive forces. It maximizes coverage of key regions, minimizes camera overlap, and supports high-resolution monitoring and tracking of pedestrians. Extensive experiments show that our approach outperforms state-of-the-art methods, achieving superior scene and pedestrian coverage performance.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146162733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengmeng Wang, Lili Wan, Bo Peng, Wanru Xu, Shenghui Wang
{"title":"Multimodal Dance Generation With Multi-Granularity Style Control and Text Guidance","authors":"Mengmeng Wang, Lili Wan, Bo Peng, Wanru Xu, Shenghui Wang","doi":"10.1002/cav.70097","DOIUrl":"https://doi.org/10.1002/cav.70097","url":null,"abstract":"<div>\u0000 \u0000 <p>Dance generation is a significant research area in computer arts and artificial intelligence. This study proposes a novel framework to enhance dance controllability and personalization through multimodal and multi-granularity control. The framework establishes global choreographic control of long sequences via music and dance style factors, while accommodating local style variations. Simultaneously, it enables fine-grained local control using style, text, and temporal factors for motion refinement. We develop two cross-modal Transformers: the LS-M2D model merges music and dance style features for local style-controllable dance generation, and the LT-SM2D model integrates textual guidance with music and dance style features for time-constrained local control. Experimental results demonstrate enhanced motion quality, effective multi-granularity style control, and precise text-guided flexibility. This provides valuable technical support for personalized intelligent dance generation systems.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146136325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Lin, Jingjing Chen, Feng Chen, Zijie Zheng, Jieming Ke
{"title":"A Knowledge Visualization Method Based on Knowledge Cube for Virtual Reality Learning","authors":"Yi Lin, Jingjing Chen, Feng Chen, Zijie Zheng, Jieming Ke","doi":"10.1002/cav.70096","DOIUrl":"https://doi.org/10.1002/cav.70096","url":null,"abstract":"<div>\u0000 \u0000 <p>Advances in microelectronic components and high-speed networks have enabled the widespread application of virtual reality (VR) technology in education. However, insufficient attention to knowledge visualization in VR learning has resulted in disorganized knowledge structures, comprehension difficulties, and mismatches between user experience and learning achievement. Therefore, we propose a Knowledge Cube (KC)-based visualization method to standardize knowledge encoding in VR learning. During courseware development, the instructor defines discrete knowledge as Events, organizes them into Event Groups, and populates data into a KC model to generate VR courseware. In subsequent VR learning, when learners search for task-relevant knowledge using the provided retrieval method, the KC model presents the corresponding events within interactive scenarios according to its predefined structure. Comparative experiments on different knowledge visualization methods revealed that, in VR learning, the KC method outperforms other VR approaches in both learning performance and efficiency. This method effectively guided learners to focus on the learning content and optimized the knowledge encoding in VR learning. This provides an operational framework for knowledge encoding in VR courseware design and emphasizes the importance of supporting effective learning behaviors over merely pursuing immersion, presenting a new perspective for refining the design approach of VR courseware.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146136324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced Sign Language Translation: A Holistic Network for Hand Gesture Recognition Using Deep Learning","authors":"S. L. Reeja, P. S. Deepthi, T. Soumya","doi":"10.1002/cav.70084","DOIUrl":"https://doi.org/10.1002/cav.70084","url":null,"abstract":"<div>\u0000 \u0000 <p>Sign language recognition (SLR) requires interpreting dynamic hand gestures with complex variations in shape, orientation, motion, and spatial configuration. Conventional models such as U-Net and ResNet offer strengths in segmentation and feature extraction, respectively, but face critical limitations. U-Net struggles with retaining fine spatial details in cluttered backgrounds and lacks temporal modeling, while ResNet can lose motion continuity and suffers from vanishing gradient issues in deeper architectures. To overcome these challenges, we propose the holistic sign language interpretation network (HSLIN), a novel deep learning framework tailored for Indian sign language (ISL) recognition. HSLIN incorporates three key innovations: Uniformed frame isolation and augmentation (UFIA) for standardized preprocessing and noise removal, synaptic gesture movement analysis (SGMA) for capturing detailed motion using keypoint detection and optical flow, and a hybrid architecture combining U-Net-based segmentation with an enhanced ResNet-TC50V2 backbone. The novelty lies in fusing spatial precision with deep temporal modeling through bottleneck layers and temporal convolutional layers (TCL), enabling the model to effectively learn gesture patterns over time. Experimental results on the ISL-CSLTR dataset demonstrate that the proposed method achieves an accuracy of 99.9%, a precision of 100%, recall of 99.9%, and an F1-score of 100% across 14 word-level sign classes. Furthermore, an ablation study confirms the critical role of each architectural component in achieving optimal performance. These outcomes clearly establish the robustness, efficiency, and uniqueness of the proposed HSLIN framework, positioning it as a powerful solution for real-world ISL recognition and communication accessibility for the deaf and hard-of-hearing community.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146091171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to “Artist-Directable Motion Generation Using Overlapping Minimum Jerk Trajectories for Interactive Embodied Social Agent Adapting to Dynamic Environments”","authors":"","doi":"10.1002/cav.70093","DOIUrl":"https://doi.org/10.1002/cav.70093","url":null,"abstract":"<p>\u0000 <span>H Sato</span>, <span>H Mitake</span>, and <span>S Hasegawa</span> “ <span>Artist-Directable Motion Generation Using Overlapping Minimum Jerk Trajectories for Interactive Embodied Social Agent Adapting to Dynamic Environments</span>,” <i>Computer Animation and Virtual Worlds</i> <span>36</span>, no. <span>6</span> (<span>2025</span>): e70075.</p><p>The name of one of the authors was incorrect as “Shouichi Hasegawa.” The correct spelling is “Shoichi Hasegawa.”</p><p>We apologize for this error.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.70093","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146002515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SinMDGan: A Hybrid Deep Learning Framework for Single Motion Synthesis Using Diffusion-GAN Models","authors":"Qiang Chen, Binsong Zuo, Tingsong Lu, Yuming Fang, Xiaolu Mu, Chao Cai, Xiaogang Jin","doi":"10.1002/cav.70091","DOIUrl":"https://doi.org/10.1002/cav.70091","url":null,"abstract":"<div>\u0000 \u0000 <p>Generating diverse and realistic movements has long been a central challenge in computer graphics. Generative Adversarial Networks (GANs) remain a compelling solution due to their ability to perform well even with limited training data. However, traditional GANs generate samples directly, which can lead to the omission of certain data patterns. To address this limitation, we introduce <i>SinMDGan</i>, a hybrid deep learning framework for single-motion synthesis that leverages a Diffusion-GAN model. Our approach integrates the strengths of GANs, which capture global motion characteristics, with diffusion techniques, which refine local details, ensuring both authenticity and diversity in generated movements. Unlike conventional cascaded GANs, our framework employs a single generator-discriminator pair, utilizing different diffusion time steps to synthesize novel and diverse motions from a single short sequence. Experimental evaluations demonstrate the effectiveness of our model in achieving stable data distribution coverage and enhancing output diversity. Additionally, we showcase various applications, including motion composition and long-sequence generation, highlighting the versatility of our approach.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146007850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Text-Driven High-Quality 3D Human Generation via Variational Gradient Estimation and Latent Reward Models","authors":"Pengfei Zhou, Xukun Shen, Yong Hu","doi":"10.1002/cav.70089","DOIUrl":"https://doi.org/10.1002/cav.70089","url":null,"abstract":"<div>\u0000 \u0000 <p>Recent advances in Score Distillation Sampling (SDS) have enabled text-driven 3D human generation, yet the standard classifier-free guidance (CFG) framework struggles with semantic misalignment and texture oversaturation due to limited model capacity. We propose a novel framework that decouples conditional and unconditional guidance via a dual-model strategy: A pretrained diffusion model ensures geometric stability, while a preference-tuned latent reward model enhances semantic fidelity. To further refine noise estimation, we introduce a lightweight U-shaped Swin Transformer (U-Swin) that regularizes predicted noise against the reward model, reducing gradient bias and local artifacts. Additionally, we design a time-varying noise weighting mechanism to dynamically balance the two guidance signals during denoising, improving stability and texture realism. Extensive experiments show that our method significantly improves alignment with textual descriptions, enhances texture details, and outperforms state-of-the-art baselines in both visual quality and semantic consistency.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145963945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Environmental Design Elements in Library Spaces: A Virtual Reality Study of Psychophysiological Responses to Color, Material, and Lighting in Built Environments","authors":"Mengyan Lin, Ning Li","doi":"10.1002/cav.70092","DOIUrl":"https://doi.org/10.1002/cav.70092","url":null,"abstract":"<div>\u0000 \u0000 <p>This study explores how environmental design elements in library spaces influence human psychophysiological responses using virtual reality (VR). Thirty participants experienced VR simulations of library reading areas, with variations in wall color, flooring material, and lighting intensity, whereas electroencephalography (EEG) and galvanic skin response (GSR) measured physiological reactions alongside subjective ratings. Moderate lighting (20,000–30,000 cd) minimized arousal and supported attention, while white walls enhanced relaxation via increased alpha brain activity. Green plant walls slightly boosted attention-related beta activity, and wood flooring was rated highest for comfort and naturalness. VR enabled precise control of design variables, advancing environmental psychology research. These findings offer evidence-based guidelines for designing public spaces like libraries to enhance user well-being and cognitive performance, with implications for educational and public buildings.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145930986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Innovating 3D Object Generation for the Metaverse Through Speech Input","authors":"An Chao Tsai, Pierre Dave Victor Katuhe","doi":"10.1002/cav.70088","DOIUrl":"https://doi.org/10.1002/cav.70088","url":null,"abstract":"<div>\u0000 \u0000 <p>This research introduces a novel approach to generate three-dimensional objects using human voice as input for a machine learning model. The spoken input is first converted into text using a Google API, which then guides the creation of the desired three-dimensional object. This approach is particularly beneficial for individuals without design expertise, paving the way for broader participation in the evolving Metaverse—a virtual reality that transcends our physical realm. The three-dimensional object generation model comprises three primary components: Neural Radiance Fields (NeRF), Low-Rank Adaptation (LoRA), and Stable Diffusion. When combined, these components facilitate the creation of a diverse range of three-dimensional objects. Our method presents an innovative approach to harnessing speech recognition for generating three-dimensional objects within the Metaverse, while demonstrating competitive quality and practical efficiency.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}