{"title":"SCNet: A Dual-Branch Network for Strong Noisy Image Denoising Based on Swin Transformer and ConvNeXt","authors":"Chuchao Lin, Changjun Zou, Hangbin Xu","doi":"10.1002/cav.70030","DOIUrl":"https://doi.org/10.1002/cav.70030","url":null,"abstract":"<div>\u0000 \u0000 <p>Image denoising plays a vital role in restoring high-quality images from noisy inputs and directly impacts downstream vision tasks. Traditional methods often fail under strong noise, causing detail loss or excessive smoothing. While recent Convolutional Neural Networks-based and Transformer-based models have shown progress, they struggle to jointly capture global structure and preserve local details. To address this, we propose SCNet, a dual-branch fusion network tailored for strong-noise denoising. It combines a Swin Transformer branch for global context modeling and a ConvNeXt branch for fine-grained local feature extraction. Their outputs are adaptively merged via a Feature Fusion Block using joint spatial and channel attention, ensuring semantic consistency and texture fidelity. A multi-scale upsampling module and the Charbonnier loss further improve structural accuracy and visual quality. Extensive experiments on four benchmark datasets show that SCNet outperforms state-of-the-art methods, especially under severe noise, and proves effective in real-world tasks such as mural image restoration.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AIKII: An AI-Enhanced Knowledge Interactive Interface for Knowledge Representation in Educational Games","authors":"Dake Liu, Huiwen Zhao, Wen Tang, Wenwen Yang","doi":"10.1002/cav.70052","DOIUrl":"https://doi.org/10.1002/cav.70052","url":null,"abstract":"<div>\u0000 \u0000 <p>The use of generative AI to create responsive and adaptive game content has attracted considerable interest within the educational game design community, highlighting its potential as a tool for enhancing players' understanding of in-game knowledge. However, designing effective player-AI interaction to support knowledge representation remains unexplored. This paper presents AIKII, an AI-enhanced Knowledge Interaction Interface designed to facilitate knowledge representation in educational games. AIKII employs various interaction channels to represent in-game knowledge and support player engagement. To investigate its effectiveness and user learning experience, we implemented AIKII into The Journey of Poetry, an educational game centered on learning Chinese poetry, and conducted interviews with university students. The results demonstrated that our method fosters contextual and reflective connections between players and in-game knowledge, enhancing player autonomy and immersion.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144197102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DTGS: Defocus-Tolerant View Synthesis Using Gaussian Splatting","authors":"Xinying Dai, Li Yao","doi":"10.1002/cav.70045","DOIUrl":"https://doi.org/10.1002/cav.70045","url":null,"abstract":"<div>\u0000 \u0000 <p>Defocus blur poses a significant challenge for 3D reconstruction, as traditional methods often struggle to maintain detail and accuracy in blurred regions. Building upon the recent advancements in the 3DGS technique, we propose an architecture for 3D scene reconstruction from defocused blurry images. Due to the sparsity of point clouds initialized by SfM, we improve the scene representation by reasonably filling in new Gaussians where the Gaussian field is insufficient. During the optimization phase, we adjust the gradient field based on the depth values of the Gaussians and introduce perceptual loss in the objective function to reduce reconstruction bias caused by blurriness and enhance the realism of the rendered results. Experimental results on both synthetic and real datasets show that our method outperforms existing approaches in terms of reconstruction quality and robustness, even under challenging defocus blur conditions.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144197101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint-Learning: A Robust Segmentation Method for 3D Point Clouds Under Label Noise","authors":"Mengyao Zhang, Jie Zhou, Tingyun Miao, Yong Zhao, Xin Si, Jingliang Zhang","doi":"10.1002/cav.70038","DOIUrl":"https://doi.org/10.1002/cav.70038","url":null,"abstract":"<div>\u0000 \u0000 <p>Most of point cloud segmentation methods are based on clean datasets and are easily affected by label noise. We present a novel method called Joint-learning, which is the first attempt to apply a dual-network framework to point cloud segmentation with noisy labels. Two networks are trained simultaneously, and each network selects clean samples to update its peer network. The communication between two networks is able to exchange the knowledge they learned, possessing good robustness and generalization ability. Subsequently, adaptive sample selection is proposed to maximize the learning capacity. When the accuracies of both networks are no longer improving, the selection rate is reduced, which results in cleaner selected samples. To further reduce the impact of noisy labels, for unselected samples, we provide a joint label correction algorithm to rectify their labels via two networks' predictions. We conduct various experiments on S3DIS and ScanNet-v2 datasets under different types and rates of noises. Both quantitative and qualitative results verify the reasonableness and effectiveness of the proposed method. By comparison, our method is substantially superior to the state-of-the-art methods and achieves the best results in all noise settings. The average performance improvement is more than 7.43%, with a maximum of 11.42%.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144190905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiajie Wu, Frederick W. B. Li, Gary K. L. Tam, Bailin Yang, Fangzhe Nan, Jiahao Pan
{"title":"Talking Face Generation With Lip and Identity Priors","authors":"Jiajie Wu, Frederick W. B. Li, Gary K. L. Tam, Bailin Yang, Fangzhe Nan, Jiahao Pan","doi":"10.1002/cav.70026","DOIUrl":"https://doi.org/10.1002/cav.70026","url":null,"abstract":"<div>\u0000 \u0000 <p>Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity-consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi-reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity-preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise Motion Inbetweening via Bidirectional Autoregressive Diffusion Models","authors":"Jiawen Peng, Zhuoran Liu, Jingzhong Lin, Gaoqi He","doi":"10.1002/cav.70040","DOIUrl":"https://doi.org/10.1002/cav.70040","url":null,"abstract":"<div>\u0000 \u0000 <p>Conditional motion diffusion models have demonstrated significant potential in generating natural and reasonable motions response to constraints such as keyframes, that can be used for motion inbetweening task. However, most methods struggle to match the keyframe constraints accurately, which resulting in unsmooth transitions between keyframes and generated motion. In this article, we propose Bidirectional Autoregressive Motion Diffusion Inbetweening (BAMDI) to generate seamless motion between start and target frames. The main idea is to transfer the motion diffusion model to autoregressive paradigm, which predicts subsequence of motion adjacent to both start and target keyframes to infill the missing frames through several iterations. This can help to improve the local consistency of generated motion. Additionally, bidirectional generation make sure the smoothness on both start frame target keyframes. Experiments show our method achieves state-of-the-art performance compared with other diffusion-based motion inbetweening methods.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PG-VTON: Front-And-Back Garment Guided Panoramic Gaussian Virtual Try-On With Diffusion Modeling","authors":"Jian Zheng, Shengwei Sang, Yifei Lu, Guojun Dai, Xiaoyang Mao, Wenhui Zhou","doi":"10.1002/cav.70054","DOIUrl":"https://doi.org/10.1002/cav.70054","url":null,"abstract":"<div>\u0000 \u0000 <p>Virtual try-on (VTON) technology enables the rapid creation of realistic try-on experiences, which makes it highly valuable for the metaverse and e-commerce. However, 2D VTON methods struggle to convey depth and immersion, while existing 3D methods require multi-view garment images and face challenges in generating high-fidelity garment textures. To address the aforementioned limitations, this paper proposes a panoramic Gaussian VTON framework guided solely by front-and-back garment information, named PG-VTON, which uses an adapted local controllable diffusion model for generating virtual dressing effects in specific regions. Specifically, PG-VTON adopts a coarse-to-fine architecture consisting of two stages. The coarse editing stage employs a local controllable diffusion model with a score distillation sampling (SDS) loss to generate coarse garment geometries with high-level semantics. Meanwhile, the refinement stage applies the same diffusion model with a photometric loss not only to enhance garment details and reduce artifacts but also to correct unwanted noise and distortions introduced during the coarse stage, thereby effectively enhancing realism. To improve training efficiency, we further introduce a dynamic noise scheduling (DNS) strategy, which ensures stable training and high-fidelity results. Experimental results demonstrate the superiority of our method, which achieves geometrically consistent and highly realistic 3D virtual try-on generation.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust 3D Mesh Segmentation Algorithm With Anisotropic Sparse Embedding","authors":"Mengyao Zhang, Wenting Li, Yong Zhao, Xin Si, Jingliang Zhang","doi":"10.1002/cav.70042","DOIUrl":"https://doi.org/10.1002/cav.70042","url":null,"abstract":"<div>\u0000 \u0000 <p>3D mesh segmentation, as a very challenging problem in computer graphics, has attracted considerable interest. The most popular methods in recent years are data-driven methods. However, such methods require a large amount of accurately labeled data, which is difficult to obtain. In this article, we propose a novel mesh segmentation algorithm based on anisotropic sparse embedding. We first over-segment the input mesh and get a collection of patches. Then these patches are embedded into a latent space via an anisotropic <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mrow>\u0000 <mi>L</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>1</mn>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$$ {L}_1 $$</annotation>\u0000 </semantics></math>-regularized optimization problem. In the new space, the patches that belong to the same part of the mesh will be closer, while those belonging to different parts will be farther. Finally, we can easily generate the segmentation result by clustering. Various experimental results on the PSB and COSEG datasets show that our algorithm is able to get perception-aware results and is superior to the state-of-the-art algorithms. In addition, the proposed algorithm can robustly deal with meshes with different poses, different triangulations, noises, missing regions, or missing parts.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UTMCR: 3U-Net Transformer With Multi-Contrastive Regularization for Single Image Dehazing","authors":"HangBin Xu, ChangJun Zou, ChuChao Lin","doi":"10.1002/cav.70029","DOIUrl":"https://doi.org/10.1002/cav.70029","url":null,"abstract":"<div>\u0000 \u0000 <p>Convolutional neural networks have a long history of development in single-width dehazing tasks, but have gradually been dominated by the Transformer framework due to their insufficient global modeling capability and large number of parameters. However, the existing Transformer network structure adopts a single U-Net structure, which is insufficient in multi-level and multi-scale feature fusion and modeling capability. Therefore, we propose an end-to-end dehazing network (UTMCR-Net). The network consists of two parts: (1) UT module, which connects three U-Net networks in series, where the backbone is replaced by the Dehazeformer block. By connecting three U-Net networks in series, we can improve the image global modeling capability and capture multi-scale information at different levels to achieve multi-level and multi-scale feature fusion. (2) MCR module, which improves the original contrastive regularization method by splitting the results of the UT module into four equal blocks, which are then compared and learned by using the contrast regularization method, respectively. Specifically, we use three U-Net networks to enhance the global modeling capability of UTMCR as well as the multi-scale feature fusion capability. The image dehazing ability is further enhanced using the MCR module. Experimental results show that our method achieves better results on most datasets.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yalan Zhang, Yuhang Xu, Xiaokun Wang, Angelos Chatzimparmpas, Xiaojuan Ban
{"title":"Decoupling Density Dynamics: A Neural Operator Framework for Adaptive Multi-Fluid Interactions","authors":"Yalan Zhang, Yuhang Xu, Xiaokun Wang, Angelos Chatzimparmpas, Xiaojuan Ban","doi":"10.1002/cav.70027","DOIUrl":"https://doi.org/10.1002/cav.70027","url":null,"abstract":"<div>\u0000 \u0000 <p>The dynamic interface prediction of multi-density fluids presents a fundamental challenge across computational fluid dynamics and graphics, rooted in nonlinear momentum transfer. We present Density-Conditioned Dynamic Convolution, a novel neural operator framework that establishes differentiable density-dynamics mapping through decoupled operator response. The core theoretical advancement lies in continuously adaptive neighborhood kernels that transform local density distributions into tunable filters, enabling unified representation from homogeneous media to multi-phase fluid. Experiments demonstrate autonomous evolution of physically consistent interface separation patterns in density contrast scenarios, including cocktail and bidirectional hourglass flow. Quantitative evaluation shows improved computational efficiency compared to a SPH method and qualitatively plausible interface dynamics, with a larger time step size.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144140557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}