{"title":"GPSwap: High-resolution face swapping based on StyleGAN prior","authors":"Dongjin Huang, Chuanman Liu, Jinhua Liu","doi":"10.1002/cav.2238","DOIUrl":"https://doi.org/10.1002/cav.2238","url":null,"abstract":"<p>Existing high-resolution face-swapping works are still challenges in preserving identity consistency while maintaining high visual quality. We present a novel high-resolution face-swapping method GPSwap, which is based on StyleGAN prior. To better preserves identity consistency, the proposed facial feature recombination network fully leverages the properties of both <i>w</i> space and encoders to decouple identities. Furthermore, we presents the image reconstruction module aligns and blends images in <i>FS</i> space, which further supplements facial details and achieves natural blending. It not only improves image resolution but also optimizes visual quality. Extensive experiments and user studies demonstrate that GPSwap is superior to state-of-the-art high-resolution face-swapping methods in terms of image quality and identity consistency. In addition, GPSwap saves nearly 80% of training costs compared to other high-resolution face-swapping works.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 4","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141608039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiannan Ye, Xiaoxu Meng, Daiyun Guo, Cheng Shang, Haotian Mao, Xubo Yang
{"title":"Neural foveated super-resolution for real-time VR rendering","authors":"Jiannan Ye, Xiaoxu Meng, Daiyun Guo, Cheng Shang, Haotian Mao, Xubo Yang","doi":"10.1002/cav.2287","DOIUrl":"https://doi.org/10.1002/cav.2287","url":null,"abstract":"<p>As virtual reality display technologies advance, resolutions and refresh rates continue to approach human perceptual limits, presenting a challenge for real-time rendering algorithms. Neural super-resolution is promising in reducing the computation cost and boosting the visual experience by scaling up low-resolution renderings. However, the added workload of running neural networks cannot be neglected. In this article, we try to alleviate the burden by exploiting the foveated nature of the human visual system, in a way that we upscale the coarse input in a heterogeneous manner instead of uniform super-resolution according to the visual acuity decreasing rapidly from the focal point to the periphery. With the help of dynamic and geometric information (i.e., pixel-wise motion vectors, depth, and camera transformation) available inherently in the real-time rendering content, we propose a neural accumulator to effectively aggregate the amortizedly rendered low-resolution visual information from frame to frame recurrently. By leveraging a partition-assemble scheme, we use a neural super-resolution module to upsample the low-resolution image tiles to different qualities according to their perceptual importance and reconstruct the final output adaptively. Perceptually high-fidelity foveated high-resolution frames are generated in real-time, surpassing the quality of other foveated super-resolution methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 4","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141608012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and development of a mixed reality teaching systems for IV cannulation and clinical instruction","authors":"Wei Xiong, Yingda Peng","doi":"10.1002/cav.2288","DOIUrl":"https://doi.org/10.1002/cav.2288","url":null,"abstract":"<p>Intravenous cannulation (IV) is a common technique used in clinical infusion. This study developed a mixed reality IV cannulation teaching system based on the Hololens2 platform. The paper integrates cognitive-affective theory of learning with media (CATLM) and investigates the cognitive engagement and willingness to use the system from the learners' perspective. Through experimental research on 125 subjects, the variables affecting learners' cognitive engagement and intention to use were determined. On the basis of CATLM, three new mixed reality attributes, immersion, system verisimilitude, and response time, were introduced, and their relationships with cognitive participation and willingness to use were determined. The results show that high immersion of mixed reality technology promotes students' higher cognitive engagement; however, this high immersion does not significantly affect learners' intention to use mixed reality technology for learning. Overall, cognitive and emotional theories are effective in mixed reality environments, and the model has good adaptability. This study provides a reference for the application of mixed reality technology in medical education.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141329416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mastering broom-like tools for object transportation animation using deep reinforcement learning","authors":"Guan-Ting Liu, Sai-Keung Wong","doi":"10.1002/cav.2255","DOIUrl":"https://doi.org/10.1002/cav.2255","url":null,"abstract":"<div>\u0000 \u0000 <p>In this paper, we propose a deep reinforcement-based approach to generate an animation of an agent using a broom-like tool to transport a target object. The tool is attached to the agent. So when the agent moves, the tool moves as well.The challenge is to control the agent to move and use the tool to push the target while avoiding obstacles. We propose a direction sensor to guide the agent's movement direction in environments with static obstacles. Furthermore, different rewards and a curriculum learning are implemented to make the agent efficiently learn skills for manipulating the tool. Experimental results show that the agent can naturally control the tool with different shapes to transport target objects. The result of ablation tests revealed the impacts of the rewards and some state components.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141326754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast constrained optimization for cloth simulation parameters from static drapes","authors":"Eunjung Ju, Eungjune Shim, Kwang-yun Kim, Sungjin Yoon, Myung Geol Choi","doi":"10.1002/cav.2265","DOIUrl":"https://doi.org/10.1002/cav.2265","url":null,"abstract":"<p>We present a cloth simulation parameter estimation method that integrates the flexibility of global optimization with the speed of neural networks. While global optimization allows for varied designs in objective functions and specifying the range of optimization variables, it requires thousands of objective function evaluations. Each evaluation, which involves a cloth simulation, is computationally demanding and impractical time-wise. On the other hand, neural network learning methods offer quick estimation results but face challenges such as the need for data collection, re-training when input data formats change, and difficulties in setting constraints on variable ranges. Our proposed method addresses these issues by replacing the simulation process, typically necessary for objective function evaluations in global optimization, with a neural network for inference. We demonstrate that, once an estimation model is trained, optimization for various objective functions becomes straightforward. Moreover, we illustrate that it is possible to achieve optimization results that reflect the intentions of expert users through visualization of a wide optimization space and the use of range constraints.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2265","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141326755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward comprehensive Chiroptera modeling: A parametric multiagent model for bat behavior","authors":"Brendan Marney, Brandon Haworth","doi":"10.1002/cav.2251","DOIUrl":"https://doi.org/10.1002/cav.2251","url":null,"abstract":"<p>Chiroptera behavior is complex and often unseen as bats are nocturnal, small, and elusive animals. Chiroptology has led to significant insights into the behavior and environmental interactions of bats. Biology, ecology, and even digital media often benefit from mathematical models of animals including humans. However, the history of Chiroptera modeling is often limited to specific behaviors, species, or biological functions and relies heavily on classical modeling methodologies that may not fully represent individuals or colonies well. This work proposes a continuous, parametric, multiagent, Chiroptera behavior model that captures the latest research in echolocation, hunting, and energetics of bats. This includes echolocation-based perception (or lack thereof), hunting patterns, roosting behavior, and energy consumption rates. We proposed the integration of these mathematical models in a framework that affords the individual simulation of bats within large-scale colonies. Practitioners can adjust the model to account for different perceptual affordances or patterns among species of bats, or even individuals (such as sickness or injury). We show that our model closely matches results from the literature, affords an animated graphical simulation, and has utility in simulation-based studies.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2251","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141326753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting roads from satellite images via enhancing road feature investigation in learning","authors":"Shiming Feng, Fei Hou, Jialu Chen, Wencheng Wang","doi":"10.1002/cav.2275","DOIUrl":"https://doi.org/10.1002/cav.2275","url":null,"abstract":"<p>It is a hot topic to extract road maps from satellite images. However, it is still very challenging with existing methods to achieve high-quality results, because the regions covered by satellite images are very large and the roads are slender, complex and only take up a small part of a satellite image, making it difficult to distinguish roads from the background in satellite images. In this article, we address this challenge by presenting two modules to more effectively learn road features, and so improving road extraction. The first module exploits the differences between the patches containing roads and the patches containing no road to exclude the background regions as many as possible, by which the small part containing roads can be more specifically investigated for improvement. The second module enhances feature alignment in decoding feature maps by using strip convolution in combination with the attention mechanism. These two modules can be easily integrated into the networks of existing learning methods for improvement. Experimental results show that our modules can help existing methods to achieve high-quality results, superior to the state-of-the-art methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141315447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luyuan Wang, Yiqian Wu, Yong-Liang Yang, Chen Liu, Xiaogang Jin
{"title":"Identity-consistent transfer learning of portraits for digital apparel sample display","authors":"Luyuan Wang, Yiqian Wu, Yong-Liang Yang, Chen Liu, Xiaogang Jin","doi":"10.1002/cav.2278","DOIUrl":"https://doi.org/10.1002/cav.2278","url":null,"abstract":"<p>The rapid development of the online apparel shopping industry demands innovative solutions for high-quality digital apparel sample displays with virtual avatars. However, developing such displays is prohibitively expensive and prone to the well-known “uncanny valley” effect, where a nearly human-looking artifact arouses eeriness and repulsiveness, thus affecting the user experience. To effectively mitigate the “uncanny valley” effect and improve the overall authenticity of digital apparel sample displays, we present a novel photo-realistic portrait generation framework. Our key idea is to employ transfer learning to learn an identity-consistent mapping from the latent space of rendered portraits to that of real portraits. During the inference stage, the input portrait of an avatar can be directly transferred to a realistic portrait by changing its appearance style while maintaining the facial identity. To this end, we collect a new dataset, <b>D</b>az-<b>R</b>endered-<b>F</b>aces-<b>HQ</b> (<i>DRFHQ</i>), specifically designed for rendering-style portraits. We leverage this dataset to fine-tune the StyleGAN2-<i>FFHQ</i> generator, using our carefully crafted framework, which helps to preserve the geometric and color features relevant to facial identity. We evaluate our framework using portraits with diverse gender, age, and race variations. Qualitative and quantitative evaluations, along with ablation studies, highlight our method's advantages over state-of-the-art approaches.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141298633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongtang Bao, Xiang Liu, Yue Qi, Ruijun Liu, Haojie Li
{"title":"Adaptive information fusion network for multi-modal personality recognition","authors":"Yongtang Bao, Xiang Liu, Yue Qi, Ruijun Liu, Haojie Li","doi":"10.1002/cav.2268","DOIUrl":"https://doi.org/10.1002/cav.2268","url":null,"abstract":"<p>Personality recognition is of great significance in deepening the understanding of social relations. While personality recognition methods have made significant strides in recent years, the challenge of heterogeneity between modalities during feature fusion still needs to be solved. This paper introduces an adaptive multi-modal information fusion network (AMIF-Net) capable of concurrently processing video, audio, and text data. First, utilizing the AMIF-Net encoder, we process the extracted audio and video features separately, effectively capturing long-term data relationships. Then, adding adaptive elements in the fusion network can alleviate the problem of heterogeneity between modes. Lastly, we concatenate audio-video and text features into a regression network to obtain Big Five personality trait scores. Furthermore, we introduce a novel loss function to address the problem of training inaccuracies, taking advantage of its unique property of exhibiting a peak at the critical mean. Our tests on the ChaLearn First Impressions V2 multi-modal dataset show partial performance surpassing state-of-the-art networks.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141298535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing doctor-patient communication in surgical explanations: Designing effective facial expressions and gestures for animated physician characters","authors":"Hwang Youn Kim, Ghazanfar Ali, Jae-In Hwang","doi":"10.1002/cav.2236","DOIUrl":"https://doi.org/10.1002/cav.2236","url":null,"abstract":"<p>Paying close attention to facial expressions, gestures, and communication techniques is essential when creating animated physician characters that are realistic and captivating when describing surgical procedures. This paper emphasizes the integration of appropriate emotions, co-speech gestures when medical experts explain the medical procedure, and designing animated characters. We can achieve healthy doctor-patient relationships and improvement of patients' understanding by depicting these components truthfully. We suggest two critical approaches to developing virtual medical experts by incorporating these elements. First, doctors can generate the contents of the surgical procedure with a virtual doctor. Second, patients can listen to the surgical procedure described by the virtual doctor and ask if they have any questions. Our system helps patients by considering their psychology and adding medical professionals' opinions. These improvements ensure the animated virtual agent is comforting, reassuring, and emotionally supportive. Through a user study, we evaluated our hypothesis and gained insight into improvements.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141264573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}