{"title":"A language-directed virtual human motion generation approach based on musculoskeletal models","authors":"Libo Sun, Yongxiang Wang, Wenhu Qin","doi":"10.1002/cav.2257","DOIUrl":"https://doi.org/10.1002/cav.2257","url":null,"abstract":"<p>The development of the systems capable of synthesizing natural and life-like motions for virtual characters has long been a central focus in computer animation. It needs to generate high-quality motions for characters and provide users with a convenient and flexible interface for guiding character motions. In this work, we propose a language-directed virtual human motion generation approach based on musculoskeletal models to achieve interactive and higher-fidelity virtual human motion, which lays the foundation for the development of language-directed controllers in physics-based character animation. First, we construct a simplified model of musculoskeletal dynamics for the virtual character. Subsequently, we propose a hierarchical control framework consisting of a trajectory tracking layer and a muscle control layer, obtaining the optimal control policy for imitating the reference motions through the training. We design a multi-policy aggregation controller based on large language models, which selects the motion policy with the highest similarity to user text commands from the action-caption data pool, facilitating natural language-based control of virtual character motions. Experimental results demonstrate that the proposed approach not only generates high-quality motions highly resembling reference motions but also enables users to effectively guide virtual characters to perform various motions via natural language instructions.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HIDE: Hierarchical iterative decoding enhancement for multi-view 3D human parameter regression","authors":"Weitao Lin, Jiguang Zhang, Weiliang Meng, Xianglong Liu, Xiaopeng Zhang","doi":"10.1002/cav.2266","DOIUrl":"https://doi.org/10.1002/cav.2266","url":null,"abstract":"<p>Parametric human modeling are limited to either single-view frameworks or simple multi-view frameworks, failing to fully leverage the advantages of easily trainable single-view networks and the occlusion-resistant capabilities of multi-view images. The prevalent presence of object occlusion and self-occlusion in real-world scenarios leads to issues of robustness and accuracy in predicting human body parameters. Additionally, many methods overlook the spatial connectivity of human joints in the global estimation of model pose parameters, resulting in cumulative errors in continuous joint parameters.To address these challenges, we propose a flexible and efficient iterative decoding strategy. By extending from single-view images to multi-view video inputs, we achieve local-to-global optimization. We utilize attention mechanisms to capture the rotational dependencies between any node in the human body and all its ancestor nodes, thereby enhancing pose decoding capability. We employ a parameter-level iterative fusion of multi-view image data to achieve flexible integration of global pose information, rapidly obtaining appropriate projection features from different viewpoints, ultimately resulting in precise parameter estimation. Through experiments, we validate the effectiveness of the HIDE method on the Human3.6M and 3DPW datasets, demonstrating significantly improved visualization results compared to previous methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Augmenting collaborative interaction with shared visualization of eye movement and gesture in VR","authors":"Yang Liu, Song Zhao, Shiwei Cheng","doi":"10.1002/cav.2264","DOIUrl":"https://doi.org/10.1002/cav.2264","url":null,"abstract":"<p>Virtual Reality (VR)-enabled multi-user collaboration has been gradually applied in academic research and industrial applications, but it still has key problems. First, it is often difficult for users to select or manipulate objects in complex three-dimesnional spaces, which greatly affects their operational efficiency. Second, supporting natural communication cues is crucial for cooperation in VR, especially in collaborative tasks, where ambiguous verbal communication cannot effectively assign partners the task of selecting or manipulating objects. To address the above issues, in this paper, we propose a new interaction method, Eye-Gesture Combination Interaction in VR, to enhance the execution of collaborative tasks by sharing the visualization of eye movement and gesture data among partners. We conducted user experiments and showed that using dots to represent eye gaze and virtual hands to represent gestures can help users complete tasks faster than other visualization methods. Finally, we developed a VR multi-user collaborative assembly system. The results of the user study show that sharing gaze points and gestures among users can significantly improve the productivity of collaborating users. Our work can effectively improve the efficiency of multi-user collaborative systems in VR and provide new design guidelines for collaborative systems in VR.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141245708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A double-layer crowd evacuation simulation method based on deep reinforcement learning","authors":"Yong Zhang, Bo Yang, Jianlin Zhu","doi":"10.1002/cav.2280","DOIUrl":"https://doi.org/10.1002/cav.2280","url":null,"abstract":"<p>Existing crowd evacuation simulation methods commonly face challenges of low efficiency in path planning and insufficient realism in pedestrian movement during the evacuation process. In this study, we propose a novel crowd evacuation path planning approach based on the learning curve–deep deterministic policy gradient (LC-DDPG) algorithm. The algorithm incorporates dynamic experience pool and a priority experience sampling strategy, enhancing convergence speed and achieving higher average rewards, thus efficiently enabling global path planning. Building upon this foundation, we introduce a double-layer method for crowd evacuation using deep reinforcement learning. Specifically, within each group, individuals are categorized into leaders and followers. At the top layer, we employ the LC-DDPG algorithm to perform global path planning for the leaders. Simultaneously, at the bottom layer, an enhanced social force model guides the followers to avoid obstacles and follow the leaders during evacuation. We implemented a crowd evacuation simulation platform. Experimental results show that our proposed method has high path planning efficiency and can generate more realistic pedestrian trajectories in different scenarios and crowd sizes.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"De-NeRF: Ultra-high-definition NeRF with deformable net alignment","authors":"Jianing Hou, Runjie Zhang, Zhongqi Wu, Weiliang Meng, Xiaopeng Zhang, Jianwei Guo","doi":"10.1002/cav.2240","DOIUrl":"https://doi.org/10.1002/cav.2240","url":null,"abstract":"<p>Neural Radiance Field (NeRF) can render complex 3D scenes with viewpoint-dependent effects. However, less work has been devoted to exploring its limitations in high-resolution environments, especially when upscaled to ultra-high resolution (e.g., 4k). Specifically, existing NeRF-based methods face severe limitations in reconstructing high-resolution real scenes, for example, a large number of parameters, misalignment of the input data, and over-smoothing of details. In this paper, we present a novel and effective framework, called <i>De-NeRF</i>, based on NeRF and deformable convolutional network, to achieve high-fidelity view synthesis in ultra-high resolution scenes: (1) marrying the deformable convolution unit which can solve the problem of misaligned input of the high-resolution data. (2) Presenting a density sparse voxel-based approach which can greatly reduce the training time while rendering results with higher accuracy. Compared to existing high-resolution NeRF methods, our approach improves the rendering quality of high-frequency details and achieves better visual effects in 4K high-resolution scenes.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Screen-space Streamline Seeding Method for Visualizing Unsteady Flow in Augmented Reality","authors":"Hyunmo Kang, JungHyun Han","doi":"10.1002/cav.2250","DOIUrl":"https://doi.org/10.1002/cav.2250","url":null,"abstract":"<p>Streamlines are a popular method of choice in many flow visualization techniques due to their simplicity and intuitiveness. This paper presents a novel streamline seeding method, which is tailored for visualizing unsteady flow in augmented reality (AR). Our method prioritizes visualizing the visible part of the flow field to enhance the flow representation's quality and reduce the computational cost. Being an image-based method, it evenly samples 2D seeds from the screen space. Then, a ray is fired toward each 2D seed, and the on-the-ray point, which has the largest entropy, is selected. It is taken as the 3D seed for a streamline. By advecting such 3D seeds in the velocity field, which is continuously updated in real time, the unsteady flow is visualized more naturally, and the temporal coherence is achieved with no extra efforts. Our method is tested using an AR application for visualizing airflow from a virtual air conditioner. Comparison with the baseline methods shows that our method is suitable for visualizing unsteady flow in AR.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PR3D: Precise and realistic 3D face reconstruction from a single image","authors":"Zhangjin Huang, Xing Wu","doi":"10.1002/cav.2254","DOIUrl":"https://doi.org/10.1002/cav.2254","url":null,"abstract":"<p>Reconstructing the three-dimensional (3D) shape and texture of the face from a single image is a significant and challenging task in computer vision and graphics. In recent years, learning-based reconstruction methods have exhibited outstanding performance, but their effectiveness is severely constrained by the scarcity of available training data with 3D annotations. To address this issue, we present the PR3D (Precise and Realistic 3D face reconstruction) method, which consists of high-precision shape reconstruction based on semi-supervised learning and high-fidelity texture reconstruction based on StyleGAN2. In shape reconstruction, we use in-the-wild face images and 3D annotated datasets to train the auxiliary encoder and the identity encoder, encoding the input image into parameters of FLAME (a parametric 3D face model). Simultaneously, a novel semi-supervised hybrid landmark loss is designed to more effectively learn from in-the-wild face images and 3D annotated datasets. Furthermore, to meet the real-time requirements in practical applications, a lightweight shape reconstruction model called fast-PR3D is distilled through teacher–student learning. In texture reconstruction, we propose a texture extraction method based on face reenactment in StyleGAN2 style space, extracting texture from the source and reenacted face images to constitute a facial texture map. Extensive experiments have demonstrated the state-of-the-art performance of our method.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhigeng Pan, Hongyi Ren, Chang Liu, Ming Chen, Mithun Mukherjee, Wenzhen Yang
{"title":"Design of a lightweight and easy-to-wear hand glove with multi-modal tactile perception for digital human","authors":"Zhigeng Pan, Hongyi Ren, Chang Liu, Ming Chen, Mithun Mukherjee, Wenzhen Yang","doi":"10.1002/cav.2258","DOIUrl":"https://doi.org/10.1002/cav.2258","url":null,"abstract":"<p>Within the field of human–computer interaction, data gloves play an essential role in establishing a connection between virtual and physical environments for the realization of digital human. To enhance the credibility of human-virtual hand interactions, we aim to develop a system incorporating a data glove-embedded technology. Our proposed system collects a wide range of information (temperature, bending, and pressure of fingers) that arise during natural interactions and afterwards reproduce them within the virtual environment. Furthermore, we implement a novel traversal polling technique to facilitate the streamlined aggregation of multi-channel sensors. This mitigates the hardware complexity of the embedded system. The experimental results indicate that the data glove demonstrates a high degree of precision in acquiring real-time hand interaction information, as well as effectively displaying hand posture in real-time using Unity3D. The data glove's lightweight and compact design facilitates its versatile utilization in virtual reality interactions.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyu Li, Meng Yang, Chao Yang, Jianglang Kang, Xiang Suo, Weiliang Meng, Zhen Li, Lijuan Mao, Bin Sheng, Jun Qi
{"title":"Soccer match broadcast video analysis method based on detection and tracking","authors":"Hongyu Li, Meng Yang, Chao Yang, Jianglang Kang, Xiang Suo, Weiliang Meng, Zhen Li, Lijuan Mao, Bin Sheng, Jun Qi","doi":"10.1002/cav.2259","DOIUrl":"https://doi.org/10.1002/cav.2259","url":null,"abstract":"<p>We propose a comprehensive soccer match video analysis pipeline tailored for broadcast footage, which encompasses three pivotal stages: soccer field localization, player tracking, and soccer ball detection. Firstly, we introduce sports camera calibration to seamlessly map soccer field images from match videos onto a standardized two-dimensional soccer field template. This addresses the challenge of consistent analysis across video frames amid continuous camera angle changes. Secondly, given challenges such as occlusions, high-speed movements, and dynamic camera perspectives, obtaining accurate position data for players and the soccer ball is non-trivial. To mitigate this, we curate a large-scale, high-precision soccer ball detection dataset and devise a robust detection model, which achieved the <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>m</mi>\u0000 <mi>A</mi>\u0000 <msub>\u0000 <mrow>\u0000 <mi>P</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>50</mn>\u0000 <mo>−</mo>\u0000 <mn>95</mn>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ mA{P}_{50-95} $$</annotation>\u0000 </semantics></math> of 80.9%. Additionally, we develop a high-speed, efficient, and lightweight tracking model to ensure precise player tracking. Through the integration of these modules, our pipeline focuses on real-time analysis of the current camera lens content during matches, facilitating rapid and accurate computation and analysis while offering intuitive visualizations.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-based control framework for motion propagation and pattern preservation in swarm flight simulations","authors":"Feixiang Qi, Bojian Wang, Meili Wang","doi":"10.1002/cav.2276","DOIUrl":"https://doi.org/10.1002/cav.2276","url":null,"abstract":"<p>Simulation of swarm motion is a crucial research area in computer graphics and animation, and is widely used in a variety of applications such as biological behavior research, robotic swarm control, and the entertainment industry. In this paper, we address the challenges of preserving structural relations between the individuals in swarm flight simulations by proposing an innovative motion control framework that utilizes a graph-based hierarchy to illustrate patterns within a swarm and allows the swarm to perform flight motions along externally specified paths. In addition, this study designs motion propagation strategies with different focuses for varied application scenarios, analyzes the effects of information transfer latencies on pattern preservation under these strategies, and optimizes the control algorithms at the mathematical level. This study not only establishes a complete set of control methods for group flight simulations, but also has excellent scalability, which can be combined with other techniques in this field to provide new solutions for group behavior simulations.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}