Zhengzheng Yin, Yao Jin, Zhijian Fang, Yun Zhang, Huaxiong Zhang, Jiu Zhou, Lili He
{"title":"Symmetrization of quasi-regular patterns with periodic tilting of regular polygons","authors":"Zhengzheng Yin, Yao Jin, Zhijian Fang, Yun Zhang, Huaxiong Zhang, Jiu Zhou, Lili He","doi":"10.1007/s41095-023-0359-z","DOIUrl":"https://doi.org/10.1007/s41095-023-0359-z","url":null,"abstract":"<p>Computer-generated aesthetic patterns are widely used as design materials in various fields. The most common methods use fractals or dynamical systems as basic tools to create various patterns. To enhance aesthetics and controllability, some researchers have introduced symmetric layouts along with these tools. One popular strategy employs dynamical systems compatible with symmetries that construct functions with the desired symmetries. However, these are typically confined to simple planar symmetries. The other generates symmetrical patterns under the constraints of tilings. Although it is slightly more flexible, it is restricted to small ranges of tilings and lacks textural variations. Thus, we proposed a new approach for generating aesthetic patterns by symmetrizing quasi-regular patterns using general <i>k</i>-uniform tilings. We adopted a unified strategy to construct invariant mappings for <i>k</i>-uniform tilings that can eliminate texture seams across the tiling edges. Furthermore, we constructed three types of symmetries associated with the patterns: dihedral, rotational, and reflection symmetries. The proposed method can be easily implemented using GPU shaders and is highly efficient and suitable for complicated tiling with regular polygons. Experiments demonstrated the advantages of our method over state-of-the-art methods in terms of flexibility in controlling the generation of patterns with various parameters as well as the diversity of textures and styles.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"175 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint training with local soft attention and dual cross-neighbor label smoothing for unsupervised person re-identification","authors":"Qing Han, Longfei Li, Weidong Min, Qi Wang, Qingpeng Zeng, Shimiao Cui, Jiongjin Chen","doi":"10.1007/s41095-023-0354-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0354-4","url":null,"abstract":"<p>Existing unsupervised person re-identification approaches fail to fully capture the fine-grained features of local regions, which can result in people with similar appearances and different identities being assigned the same label after clustering. The identity-independent information contained in different local regions leads to different levels of local noise. To address these challenges, joint training with local soft attention and dual cross-neighbor label smoothing (DCLS) is proposed in this study. First, the joint training is divided into global and local parts, whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions, which improves the ability of the re-identification model in identifying a person’s local significant features. Second, DCLS is designed to progressively mitigate label noise in different local regions. The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions, thereby achieving label smoothing of the global and local regions throughout the training process. In extensive experiments, the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"36 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DepthGAN: GAN-based depth generation from semantic layouts","authors":"Yidi Li, Jun Xiao, Yiqun Wang, Zhengda Lu","doi":"10.1007/s41095-023-0350-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0350-8","url":null,"abstract":"<p>Existing GAN-based generative methods are typically used for semantic image synthesis. We pose the question of whether GAN-based architectures can generate plausible depth maps and find that existing methods have difficulty in generating depth maps which reasonably represent 3D scene structure due to the lack of global geometric correlations. Thus, we propose DepthGAN, a novel method of generating a depth map using a semantic layout as input to aid construction, and manipulation of well-structured 3D scene point clouds. Specifically, we first build a feature generation model with a cascade of semantically-aware transformer blocks to obtain depth features with global structural information. For our semantically aware transformer block, we propose a mixed attention module and a semantically aware layer normalization module to better exploit semantic consistency for depth features generation. Moreover, we present a novel semantically weighted depth synthesis module, which generates adaptive depth intervals for the current scene. We generate the final depth map by using a weighted combination of semantically aware depth weights for different depth ranges. In this manner, we obtain a more accurate depth map. Extensive experiments on indoor and outdoor datasets demonstrate that DepthGAN achieves superior results both quantitatively and visually for the depth generation task.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"72 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaokun Wang, Yanrui Xu, Sinuo Liu, Bo Ren, Jiří Kosinka, Alexandru C. Telea, Jiamin Wang, Chongming Song, Jian Chang, Chenfeng Li, Jian Jun Zhang, Xiaojuan Ban
{"title":"Physics-based fluid simulation in computer graphics: Survey, research trends, and challenges","authors":"Xiaokun Wang, Yanrui Xu, Sinuo Liu, Bo Ren, Jiří Kosinka, Alexandru C. Telea, Jiamin Wang, Chongming Song, Jian Chang, Chenfeng Li, Jian Jun Zhang, Xiaojuan Ban","doi":"10.1007/s41095-023-0368-y","DOIUrl":"https://doi.org/10.1007/s41095-023-0368-y","url":null,"abstract":"<p>Physics-based fluid simulation has played an increasingly important role in the computer graphics community. Recent methods in this area have greatly improved the generation of complex visual effects and its computational efficiency. Novel techniques have emerged to deal with complex boundaries, multiphase fluids, gas–liquid interfaces, and fine details. The parallel use of machine learning, image processing, and fluid control technologies has brought many interesting and novel research perspectives. In this survey, we provide an introduction to theoretical concepts underpinning physics-based fluid simulation and their practical implementation, with the aim for it to serve as a guide for both newcomers and seasoned researchers to explore the field of physics-based fluid simulation, with a focus on developments in the last decade. Driven by the distribution of recent publications in the field, we structure our survey to cover physical background; discretization approaches; computational methods that address scalability; fluid interactions with other materials and interfaces; and methods for expressive aspects of surface detail and control. From a practical perspective, we give an overview of existing implementations available for the above methods.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"11 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Meng Jian, Ye Xiang, Ruihai Dong
{"title":"Learning to compose diversified prompts for image emotion classification","authors":"Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Meng Jian, Ye Xiang, Ruihai Dong","doi":"10.1007/s41095-023-0389-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0389-6","url":null,"abstract":"<p>Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance in image understanding. However, the underexplored task of IEC presents three major challenges: a tremendous training objective gap between pretraining and IEC, shared suboptimal prompts, and invariant prompts for all instances. In this study, we propose a general framework that effectively exploits the language-supervised CLIP method for the IEC task. First, a prompt-tuning method that mimics the pretraining objective of CLIP is introduced, to exploit the rich image and text semantics associated with CLIP. Subsequently, instance-specific prompts are automatically composed, conditioning them on the categories and image content of instances, diversifying the prompts, and thus avoiding suboptimal problems. Evaluations on six widely used affective datasets show that the proposed method significantly outperforms state-of-the-art methods (up to 9.29% accuracy gain on the EmotionROI dataset) on IEC tasks with only a few trained parameters. The code is publicly available at https://github.com/dsn0w/PT-DPC/for research purposes.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"44 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yudi Li, Min Tang, Yun Yang, Ruofeng Tong, Shuangcai Yang, Yao Li, Bailin An, Qilong Kou
{"title":"CTSN: Predicting cloth deformation for skeleton-based characters with a two-stream skinning network","authors":"Yudi Li, Min Tang, Yun Yang, Ruofeng Tong, Shuangcai Yang, Yao Li, Bailin An, Qilong Kou","doi":"10.1007/s41095-023-0344-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0344-6","url":null,"abstract":"<p>We present a novel learning method using a two-stream network to predict cloth deformation for skeleton-based characters. The characters processed in our approach are not limited to humans, and can be other targets with skeleton-based representations such as fish or pets. We use a novel network architecture which consists of skeleton-based and mesh-based residual networks to learn the coarse features and wrinkle features forming the overall residual from the template cloth mesh. Our network may be used to predict the deformation for loose or tight-fitting clothing. The memory footprint of our network is low, thereby resulting in reduced computational requirements. In practice, a prediction for a single cloth mesh for a skeleton-based character takes about 7 ms on an nVidia GeForce RTX 3090 GPU. Compared to prior methods, our network can generate finer deformation results with details and wrinkles.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"241 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140629637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi3D: 3D-aware multimodal image synthesis","authors":"","doi":"10.1007/s41095-024-0422-4","DOIUrl":"https://doi.org/10.1007/s41095-024-0422-4","url":null,"abstract":"<h3>Abstract</h3> <p>3D-aware image synthesis has attained high quality and robust 3D consistency. Existing 3D controllable generative models are designed to synthesize 3D-aware images through a single modality, such as 2D segmentation or sketches, but lack the ability to finely control generated content, such as texture and age. In pursuit of enhancing user-guided controllability, we propose Multi3D, a 3D-aware controllable image synthesis model that supports multi-modal input. Our model can govern the geometry of the generated image using a 2D label map, such as a segmentation or sketch map, while concurrently regulating the appearance of the generated image through a textual description. To demonstrate the effectiveness of our method, we have conducted experiments on multiple datasets, including CelebAMask-HQ, AFHQ-cat, and shapenet-car. Qualitative and quantitative evaluations show that our method outperforms existing state-of-the-art methods. <span> <span> <img alt=\"\" src=\"https://static-content.springer.com/image/MediaObjects/41095_2024_422_Fig1_HTML.jpg\"/> </span> </span></p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"15 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140593367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gengxin Liu, Oliver van Kaick, Hui Huang, Ruizhen Hu
{"title":"Active self-training for weakly supervised 3D scene semantic segmentation","authors":"Gengxin Liu, Oliver van Kaick, Hui Huang, Ruizhen Hu","doi":"10.1007/s41095-022-0311-7","DOIUrl":"https://doi.org/10.1007/s41095-022-0311-7","url":null,"abstract":"<p>Since the preparation of labeled data for training semantic segmentation networks of point clouds is a time-consuming process, weakly supervised approaches have been introduced to learn from only a small fraction of data. These methods are typically based on learning with contrastive losses while automatically deriving per-point pseudo-labels from a sparse set of user-annotated labels. In this paper, our key observation is that the selection of which samples to annotate is as important as how these samples are used for training. Thus, we introduce a method for weakly supervised segmentation of 3D scenes that combines self-training with active learning. Active learning selects points for annotation that are likely to result in improvements to the trained model, while self-training makes efficient use of the user-provided labels for learning the model. We demonstrate that our approach leads to an effective method that provides improvements in scene segmentation over previous work and baselines, while requiring only a few user annotations.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"17 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Wang, Yuke Li, James H. Elder, Runmin Wu, Huchuan Lu
{"title":"Class-conditional domain adaptation for semantic segmentation","authors":"Yue Wang, Yuke Li, James H. Elder, Runmin Wu, Huchuan Lu","doi":"10.1007/s41095-023-0362-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0362-4","url":null,"abstract":"<p>Semantic segmentation is an important sub-task for many applications. However, pixel-level ground-truth labeling is costly, and there is a tendency to overfit to training data, thereby limiting the generalization ability. Unsupervised domain adaptation can potentially address these problems by allowing systems trained on labelled datasets from the source domain (including less expensive synthetic domain) to be adapted to a novel target domain. The conventional approach involves automatic extraction and alignment of the representations of source and target domains globally. One limitation of this approach is that it tends to neglect the differences between classes: representations of certain classes can be more easily extracted and aligned between the source and target domains than others, limiting the adaptation over all classes. Here, we address this problem by introducing a Class-Conditional Domain Adaptation (CCDA) method. This incorporates a class-conditional multi-scale discriminator and class-conditional losses for both segmentation and adaptation. Together, they measure the segmentation, shift the domain in a class-conditional manner, and equalize the loss over classes. Experimental results demonstrate that the performance of our CCDA method matches, and in some cases, surpasses that of state-of-the-art methods.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"272 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanghuan Liu, Shaoyan Gai, Feipeng Da, Fazal Waris
{"title":"Geometry-aware 3D pose transfer using transformer autoencoder","authors":"Shanghuan Liu, Shaoyan Gai, Feipeng Da, Fazal Waris","doi":"10.1007/s41095-023-0379-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0379-8","url":null,"abstract":"<p>3D pose transfer over unorganized point clouds is a challenging generation task, which transfers a source’s pose to a target shape and keeps the target’s identity. Recent deep models have learned deformations and used the target’s identity as a style to modulate the combined features of two shapes or the aligned vertices of the source shape. However, all operations in these models are point-wise and independent and ignore the geometric information on the surface and structure of the input shapes. This disadvantage severely limits the generation and generalization capabilities. In this study, we propose a geometry-aware method based on a novel transformer autoencoder to solve this problem. An efficient self-attention mechanism, that is, cross-covariance attention, was utilized across our framework to perceive the correlations between points at different distances. Specifically, the transformer encoder extracts the target shape’s local geometry details for identity attributes and the source shape’s global geometry structure for pose information. Our transformer decoder efficiently learns deformations and recovers identity properties by fusing and decoding the extracted features in a geometry attentional manner, which does not require corresponding information or modulation steps. The experiments demonstrated that the geometry-aware method achieved state-of-the-art performance in a 3D pose transfer task. The implementation code and data are available at https://github.com/SEULSH/Geometry-Aware-3D-Pose-Transfer-Using-Transformer-Autoencoder.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"4 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}