Zhiyong Xiao, Feng Yu, Li Liu, Tao Peng, Xinrong Hu, Minghua Jiang
{"title":"DSANet: A lightweight hybrid network for human action recognition in virtual sports","authors":"Zhiyong Xiao, Feng Yu, Li Liu, Tao Peng, Xinrong Hu, Minghua Jiang","doi":"10.1002/cav.2274","DOIUrl":"https://doi.org/10.1002/cav.2274","url":null,"abstract":"<p>Human activity recognition (HAR) has significant potential in virtual sports applications. However, current HAR networks often prioritize high accuracy at the expense of practical application requirements, resulting in networks with large parameter counts and computational complexity. This can pose challenges for real-time and efficient recognition. This paper proposes a hybrid lightweight DSANet network designed to address the challenges of real-time performance and algorithmic complexity. The network utilizes a multi-scale depthwise separable convolutional (Multi-scale DWCNN) module to extract spatial information and a multi-layer Gated Recurrent Unit (Multi-layer GRU) module for temporal feature extraction. It also incorporates an improved channel-space attention module called RCSFA to enhance feature extraction capability. By leveraging channel, spatial, and temporal information, the network achieves a low number of parameters with high accuracy. Experimental evaluations on UCIHAR, WISDM, and PAMAP2 datasets demonstrate that the network not only reduces parameter counts but also achieves accuracy rates of 97.55%, 98.99%, and 98.67%, respectively, compared to state-of-the-art networks. This research provides valuable insights for the virtual sports field and presents a novel network for real-time activity recognition deployment in embedded devices.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141091662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FrseGAN: Free-style editable facial makeup transfer based on GAN combined with transformer","authors":"Weifeng Xu, Pengjie Wang, Xiaosong Yang","doi":"10.1002/cav.2235","DOIUrl":"https://doi.org/10.1002/cav.2235","url":null,"abstract":"<p>Makeup in real life varies widely and is personalized, presenting a key challenge in makeup transfer. Most previous makeup transfer techniques divide the face into distinct regions for color transfer, frequently neglecting details like eyeshadow and facial contours. Given the successful advancements of Transformers in various visual tasks, we believe that this technology holds large potential in addressing pose, expression, and occlusion differences. To explore this, we propose novel pipeline which combines well-designed Convolutional Neural Network with Transformer to leverage the advantages of both networks for high-quality facial makeup transfer. This enables hierarchical extraction of both local and global facial features, facilitating the encoding of facial attributes into pyramid feature maps. Furthermore, a Low-Frequency Information Fusion Module is proposed to address the problem of large pose and expression variations which exist between the source and reference faces by extracting makeup features from the reference and adapting them to the source. Experiments demonstrate that our method produces makeup faces that are visually more detailed and realistic, yielding superior results.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2235","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141091663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenqing Zhao, Jianlin Zhu, Jin Huang, Ping Li, Bin Sheng
{"title":"GAN-Based Multi-Decomposition Photo Cartoonization","authors":"Wenqing Zhao, Jianlin Zhu, Jin Huang, Ping Li, Bin Sheng","doi":"10.1002/cav.2248","DOIUrl":"https://doi.org/10.1002/cav.2248","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <h3> Background</h3>\u0000 \u0000 <p>Cartoon images play a vital role in film production, scientific and educational animation, video games, and other fields, and are one of the key visual expressions of artistic creation. However, since hand-crafted cartoon images often require a great deal of time and effort on the part of professional artists, it is necessary to be able to automatically transform real-world images into different styles of cartoon images. Although cartoon images vary from artist to artist, cartoon images generally have the unique characteristics of being highly simplified and abstract, with clear edges, smooth color shading, and relatively simple textures. However, existing image cartoonization methods tend to create a number of problems when performing style transfer, which mainly include: (1) the resulting generated images do not have obvious cartoon-style textures; and (2) the generated images are prone to structural confusion, color artifacts, and loss of the original image content. Therefore, it is also a great challenge in the field of image cartoonization to be able to make a good balance between style transfer and content keeping.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Methods</h3>\u0000 \u0000 <p>In this paper, we propose a GAN-based multi-attention mechanism for image cartoonization to address the above issues. The method combines the residual block used to extract deep network features in the generator with the attention mechanism, and further strengthens the perceptual ability of the generative model to cartoon images through the adaptive feature correction of the attention module to improve the cartoon features of the generated images. At the same time, we also introduce the attention mechanism in the convolution block of the discriminator, which is used to further reduce the image visual quality problem caused by the style transfer process. By introducing the attention mechanism into the generator and discriminator models of the generative adversarial network, our method enables the generated images to have obvious cartoon-style features while effectively improving the image's visual quality.</p>\u0000 </section>\u0000 \u0000 <section>\u0000 \u0000 <h3> Results</h3>\u0000 \u0000 <p>A large number of quantitative, qualitative, and ablation experiments are conducted to demonstrate the advantages of our method in the field of image cartoonization and the role of each module in the method.</p>\u0000 </section>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141085021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heejo Jeong, Seung-wook Kim, JaeHyun Lee, Kiwon Um, Min Hyung Kee, JungHyun Han
{"title":"Momentum-preserving inversion alleviation for elastic material simulation","authors":"Heejo Jeong, Seung-wook Kim, JaeHyun Lee, Kiwon Um, Min Hyung Kee, JungHyun Han","doi":"10.1002/cav.2249","DOIUrl":"https://doi.org/10.1002/cav.2249","url":null,"abstract":"<p>This paper proposes a novel method that enhances the optimization-based elastic body solver. The proposed method tackles the <i>element inversion</i> problem, which is prevalent in the <i>prediction-projection</i> approach for numerical simulation of elastic bodies. At the prediction stage, our method alleviates inversions such that the subsequent projection solver can benefit in stability and efficiency. To prevent excessive suppression of predicted inertial motion when alleviating, we introduce a velocity decomposition method and adapt only the non-rigid motion while preserving the rigid motion, that is, linear and angular momenta. Thanks to the respected inertial motion in the prediction stage, our method produces lively motions while keeping the entire simulation more stable. The experiments demonstrate that our alleviation method successfully stabilizes the simulation and improves the efficiency particularly when large deformations hamper the solver.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uniform gradient magnetic field and spatial localization method based on Maxwell coils for virtual surgery simulation","authors":"Yi Huang, Xutian Deng, Xujie Zhao, Wenxuan Xie, Zhiyong Yuan, Jianhui Zhao","doi":"10.1002/cav.2247","DOIUrl":"https://doi.org/10.1002/cav.2247","url":null,"abstract":"<p>With the development of virtual reality technology, simulation surgery has become a low-risk surgical training method and high-precision positioning of surgical instruments is required in virtual simulation surgery. In this paper we design and validate a novel electromagnetic positioning method based on a uniform gradient magnetic field. We employ Maxwell coils to generate the uniform gradient magnetic field and propose two positioning algorithms based on magnetic field, namely the linear equation positioning algorithm and the magnetic field fingerprint positioning algorithm. After validating the feasibility of proposed positioning system through simulation, we construct a prototype system and conduct practical experiments. The experimental results demonstrate that the positioning system exhibits excellent accuracy and speed in both simulation and real-world applications. The positioning accuracy remains consistent and high, showing no significant variation with changes in the positions of surgical instruments.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin
{"title":"Facial action units detection using temporal context and feature reassignment","authors":"Sipeng Yang, Hongyu Huang, Ying Sophie Huang, Xiaogang Jin","doi":"10.1002/cav.2246","DOIUrl":"https://doi.org/10.1002/cav.2246","url":null,"abstract":"<p>Facial action units (AUs) encode the activations of facial muscle groups, playing a crucial role in expression analysis and facial animation. However, current deep learning AU detection methods primarily focus on single-image analysis, which limits the exploitation of rich temporal context for robust outcomes. Moreover, the scale of available datasets remains limited, leading models trained on these datasets to tend to suffer from overfitting issues. This paper proposes a novel AU detection method integrating spatial and temporal data with inter-subject feature reassignment for accurate and robust AU predictions. Our method first extracts regional features from facial images. Then, to effectively capture both the temporal context and identity-independent features, we introduce a temporal feature combination and feature reassignment (TC&FR) module, which transforms single-image features into a cohesive temporal sequence and fuses features across multiple subjects. This transformation encourages the model to utilize identity-independent features and temporal context, thus ensuring robust prediction outcomes. Experimental results demonstrate the enhancements brought by the proposed modules and the state-of-the-art (SOTA) results achieved by our method.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AG-SDM: Aquascape generation based on stable diffusion model with low-rank adaptation","authors":"Muyang Zhang, Jinming Yang, Yuewei Xian, Wei Li, Jiaming Gu, Weiliang Meng, Jiguang Zhang, Xiaopeng Zhang","doi":"10.1002/cav.2252","DOIUrl":"https://doi.org/10.1002/cav.2252","url":null,"abstract":"<p>As an amalgamation of landscape design and ichthyology, aquascape endeavors to create visually captivating aquatic environments imbued with artistic allure. Traditional methodologies in aquascape, governed by rigid principles such as composition and color coordination, may inadvertently curtail the aesthetic potential of the landscapes. In this paper, we propose Aquascape Generation based on Stable Diffusion Models (AG-SDM), prioritizing aesthetic principles and color coordination to offer guiding principles for real artists in Aquascape creation. We meticulously curated and annotated three aquascape datasets with varying aspect ratios to accommodate diverse landscape design requirements regarding dimensions and proportions. Leveraging the Fréchet Inception Distance (FID) metric, we trained AGFID for quality assessment. Extensive experiments validate that our AG-SDM excels in generating hyper-realistic underwater landscape images, closely resembling real flora, and achieves state-of-the-art performance in aquascape image generation.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe Guo, Bingxin Wei, Qinglin Cai, Jiayi Liu, Yi Wang
{"title":"POST: Prototype-oriented similarity transfer framework for cross-domain facial expression recognition","authors":"Zhe Guo, Bingxin Wei, Qinglin Cai, Jiayi Liu, Yi Wang","doi":"10.1002/cav.2260","DOIUrl":"https://doi.org/10.1002/cav.2260","url":null,"abstract":"<p>Facial expression recognition (FER) is one of the popular research topics in computer vision. Most deep learning expression recognition methods perform well on a single dataset, but may struggle in cross-domain FER applications when applied to different datasets. FER under cross-dataset also suffers from difficulties such as feature distribution deviation and discriminator degradation. To address these issues, we propose a prototype-oriented similarity transfer framework (POST) for cross-domain FER. The bidirectional cross-attention Swin Transformer (BCS Transformer) module is designed to aggregate local facial feature similarities across different domains, enabling the extraction of relevant cross-domain features. The dual learnable category prototypes is designed to represent potential space samples for both source and target domains, ensuring enhanced domain alignment by leveraging both cross-domain and specific domain features. We further introduce the self-training resampling (STR) strategy to enhance similarity transfer. The experimental results with the RAF-DB dataset as the source domain and the CK+, FER2013, JAFFE and SFEW 2.0 datasets as the target domains, show that our approach achieves much higher performance than the state-of-the-art cross-domain FER methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianlu Cai, Frederick W. B. Li, Fangzhe Nan, Bailin Yang
{"title":"Multi-style cartoonization: Leveraging multiple datasets with generative adversarial networks","authors":"Jianlu Cai, Frederick W. B. Li, Fangzhe Nan, Bailin Yang","doi":"10.1002/cav.2269","DOIUrl":"https://doi.org/10.1002/cav.2269","url":null,"abstract":"<p>Scene cartoonization aims to convert photos into stylized cartoons. While generative adversarial networks (GANs) can generate high-quality images, previous methods focus on individual images or single styles, ignoring relationships between datasets. We propose a novel multi-style scene cartoonization GAN that leverages multiple cartoon datasets jointly. Our main technical contribution is a multi-branch style encoder that disentangles representations to model styles as distributions over entire datasets rather than images. Combined with a multi-task discriminator and perceptual losses optimizing across collections, our model achieves state-of-the-art diverse stylization while preserving semantics. Experiments demonstrate that by learning from inter-dataset relationships, our method translates photos into cartoon images with improved realism and abstraction fidelity compared to prior arts, without iterative re-training for new styles.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-scale edge aggregation mesh-graph-network for character secondary motion","authors":"Tianyi Wang, Shiguang Liu","doi":"10.1002/cav.2241","DOIUrl":"https://doi.org/10.1002/cav.2241","url":null,"abstract":"<p>As an enhancement to skinning-based animations, light-weight secondary motion method for 3D characters are widely demanded in many application scenarios. To address the dependence of data-driven methods on ground truth data, we propose a self-supervised training strategy that is free of ground truth data for the first time in this domain. Specifically, we construct a self-supervised training framework by modeling the implicit integration problem with steps as an optimization problem based on physical energy terms. Furthermore, we introduce a multi-scale edge aggregation mesh-graph block (MSEA-MG Block), which significantly enhances the network performance. This enables our model to make vivid predictions of secondary motion for 3D characters with arbitrary structures. Empirical experiments indicate that our method, without requiring ground truth data for model training, achieves comparable or even superior performance quantitatively and qualitatively compared to state-of-the-art data-driven approaches in the field.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140953062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}