Computational Visual Media最新文献_第2页

Dance2MIDI: Dance-driven multi-instrument music generation Dance2MIDI: 舞蹈驱动的多乐器音乐生成器

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-07-24 DOI: 10.1007/s41095-024-0417-1

Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han

{"title":"Dance2MIDI: Dance-driven multi-instrument music generation","authors":"Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han","doi":"10.1007/s41095-024-0417-1","DOIUrl":"https://doi.org/10.1007/s41095-024-0417-1","url":null,"abstract":"Dance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multi-instrument scenario is under-explored. The challenges associated with dance-driven multi-instrument music (MIDI) generation are twofold: (i) lack of a publicly available multi-instrument MIDI and video paired dataset and (ii) the weak correlation between music and video. To tackle these challenges, we have built the first multi-instrument MIDI and dance paired dataset (D2MIDI). Based on this dataset, we introduce a multi-instrument MIDI generation framework (Dance2MIDI) conditioned on dance video. Firstly, to capture the relationship between dance and music, we employ a graph convolutional network to encode the dance motion. This allows us to extract features related to dance movement and dance style. Secondly, to generate a harmonious rhythm, we utilize a transformer model to decode the drum track sequence, leveraging a cross-attention mechanism. Thirdly, we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task. A BERT-like model is employed to comprehend the context of the entire music piece through self-supervised learning. We evaluate the music generated by our framework trained on the D2MIDI dataset and demonstrate that our method achieves state-of-the-art performance.","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"38 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Continual few-shot patch-based learning for anime-style colorization 基于少镜头补丁的连续学习，实现动画风格着色

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-07-09 DOI: 10.1007/s41095-024-0414-4

Akinobu Maejima, Seitaro Shinagawa, Hiroyuki Kubo, Takuya Funatomi, Tatsuo Yotsukura, Satoshi Nakamura, Yasuhiro Mukaigawa

{"title":"Continual few-shot patch-based learning for anime-style colorization","authors":"Akinobu Maejima, Seitaro Shinagawa, Hiroyuki Kubo, Takuya Funatomi, Tatsuo Yotsukura, Satoshi Nakamura, Yasuhiro Mukaigawa","doi":"10.1007/s41095-024-0414-4","DOIUrl":"https://doi.org/10.1007/s41095-024-0414-4","url":null,"abstract":"The automatic colorization of anime line drawings is a challenging problem in production pipelines. Recent advances in deep neural networks have addressed this problem; however, collectingmany images of colorization targets in novel anime work before the colorization process starts leads to chicken-and-egg problems and has become an obstacle to using them in production pipelines. To overcome this obstacle, we propose a new patch-based learning method for few-shot anime-style colorization. The learning method adopts an efficient patch sampling technique with position embedding according to the characteristics of anime line drawings. We also present a continuous learning strategy that continuously updates our colorization model using new samples colorized by human artists. The advantage of our method is that it can learn our colorization model from scratch or pre-trained weights using only a few pre- and post-colorized line drawings that are created by artists in their usual colorization work. Therefore, our method can be easily incorporated within existing production pipelines. We quantitatively demonstrate that our colorizationmethod outperforms state-of-the-art methods.","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"46 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141576145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recent advances in 3D Gaussian splatting 三维高斯拼接技术的最新进展

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-07-08 DOI: 10.1007/s41095-024-0436-y

Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao

{"title":"Recent advances in 3D Gaussian splatting","authors":"Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao","doi":"10.1007/s41095-024-0436-y","DOIUrl":"https://doi.org/10.1007/s41095-024-0436-y","url":null,"abstract":"The emergence of 3D Gaussian splatting (3DGS) has greatly accelerated rendering in novel view synthesis. Unlike neural implicit representations like neural radiance fields (NeRFs) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images. Apart from fast rendering, the explicit representation of 3D Gaussian splatting also facilitates downstream tasks like dynamic reconstruction, geometry editing, and physical simulation. Considering the rapid changes and growing number of works in this field, we present a literature review of recent 3D Gaussian splatting methods, which can be roughly classified by functionality into 3D reconstruction, 3D editing, and other downstream applications. Traditional point-based rendering methods and the rendering formulation of 3D Gaussian splatting are also covered to aid understanding of this technique. This survey aims to help beginners to quickly get started in this field and to provide experienced researchers with a comprehensive overview, aiming to stimulate future development of the 3D Gaussian splatting representation.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"52 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141576141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Illuminator: Image-based illumination editing for indoor scene harmonization 照明器基于图像的照明编辑，实现室内场景协调

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-07-05 DOI: 10.1007/s41095-023-0397-6

Zhongyun Bao, Gang Fu, Zipei Chen, Chunxia Xiao

{"title":"Illuminator: Image-based illumination editing for indoor scene harmonization","authors":"Zhongyun Bao, Gang Fu, Zipei Chen, Chunxia Xiao","doi":"10.1007/s41095-023-0397-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0397-6","url":null,"abstract":"Illumination harmonization is an important but challenging task that aims to achieve illumination compatibility between the foreground and background under different illumination conditions. Most current studies mainly focus on achieving seamless integration between the appearance (illumination or visual style) of the foreground object itself and the background scene or producing the foreground shadow. They rarely considered global illumination consistency (i.e., the illumination and shadow of the foreground object). In our work, we introduce “Illuminator”, an image-based illumination editing technique. This method aims to achieve more realistic global illumination harmonization, ensuring consistent illumination and plausible shadows in complex indoor environments. The Illuminator contains a shadow residual generation branch and an object illumination transfer branch. The shadow residual generation branch introduces a novel attention-aware graph convolutional mechanism to achieve reasonable foreground shadow generation. The object illumination transfer branch primarily transfers background illumination to the foreground region. In addition, we construct a real-world indoor illumination harmonization dataset called RIH, which consists of various foreground objects and background scenes captured under diverse illumination conditions for training and evaluating our Illuminator. Our comprehensive experiments, conducted on the RIH dataset and a collection of real-world everyday life photos, validate the effectiveness of our method.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"12 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shell stand: Stable thin shell models for 3D fabrication 外壳支架：用于三维制造的稳定薄壳模型

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-06-24 DOI: 10.1007/s41095-024-0402-8

Yu Xing, Xiaoxuan Wang, Lin Lu, Andrei Sharf, Daniel Cohen-Or, Changhe Tu

{"title":"Shell stand: Stable thin shell models for 3D fabrication","authors":"Yu Xing, Xiaoxuan Wang, Lin Lu, Andrei Sharf, Daniel Cohen-Or, Changhe Tu","doi":"10.1007/s41095-024-0402-8","DOIUrl":"https://doi.org/10.1007/s41095-024-0402-8","url":null,"abstract":"A thin shell model refers to a surface or structure, where the object’s thickness is considered negligible. In the context of 3D printing, thin shell models are characterized by having lightweight, hollow structures, and reduced material usage. Their versatility and visual appeal make them popular in various fields, such as cloth simulation, character skinning, and for thin-walled structures like leaves, paper, or metal sheets. Nevertheless, optimization of thin shell models without external support remains a challenge due to their minimal interior operational space. For the same reasons, hollowing methods are also unsuitable for this task. In fact, thin shell modulation methods are required to preserve the visual appearance of a two-sided surface which further constrain the problem space. In this paper, we introduce a new visual disparity metric tailored for shell models, integrating local details and global shape attributes in terms of visual perception. Our method modulates thin shell models using global deformations and local thickening while accounting for visual saliency, stability, and structural integrity. Thereby, thin shell models such as bas-reliefs, hollow shapes, and cloth can be stabilized to stand in arbitrary orientations, making them ideal for 3D printing.","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"18 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporal vectorized visibility for direct illumination of animated models 用于动画模型直接照明的时间矢量化可见度

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-05-29 DOI: 10.1007/s41095-023-0339-3

Zhenni Wang, Tze Yui Ho, Yi Xiao, Chi Sing Leung

{"title":"Temporal vectorized visibility for direct illumination of animated models","authors":"Zhenni Wang, Tze Yui Ho, Yi Xiao, Chi Sing Leung","doi":"10.1007/s41095-023-0339-3","DOIUrl":"https://doi.org/10.1007/s41095-023-0339-3","url":null,"abstract":"Direct illumination rendering is an important technique in computer graphics. Precomputed radiance transfer algorithms can provide high quality rendering results in real time, but they can only support rigid models. On the other hand, ray tracing algorithms are flexible and can gracefully handle animated models. With NVIDIA RTX and the AI denoiser, we can use ray tracing algorithms to render visually appealing results in real time. Visually appealing though, they can deviate from the actual one considerably. We propose a visibility-boundary edge oriented infinite triangle bounding volume hierarchy (BVH) traversal algorithm to dynamically generate visibility in vector form. Our algorithm utilizes the properties of visibility-boundary edges and infinite triangle BVH traversal to maximize the efficiency of the vector form visibility generation. A novel data structure, temporal vectorized visibility, is proposed, which allows visibility in vector form to be shared across time and further increases the generation efficiency. Our algorithm can efficiently render close-to-reference direct illumination results. With the similar processing time, it provides a visual quality improvement around 10 dB in terms of peak signal-to-noise ratio (PSNR) w.r.t. the ray tracing algorithm reservoir-based spatiotemporal importance resampling (ReSTIR).\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"6 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141167717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Super-resolution reconstruction of single image for latent features 针对潜在特征的单幅图像超分辨率重建

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-05-24 DOI: 10.1007/s41095-023-0387-8

Xin Wang, Jing-Ke Yan, Jing-Ye Cai, Jian-Hua Deng, Qin Qin, Yao Cheng

{"title":"Super-resolution reconstruction of single image for latent features","authors":"Xin Wang, Jing-Ke Yan, Jing-Ye Cai, Jian-Hua Deng, Qin Qin, Yao Cheng","doi":"10.1007/s41095-023-0387-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0387-8","url":null,"abstract":"Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image. However, during SISR tasks, it is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features. This challenge can lead to issues such as model collapse, lack of rich details and texture features in the reconstructed HR images, and excessive time consumption for model sampling. To address these problems, this paper proposes a Latent Feature-oriented Diffusion Probability Model (LDDPM). First, we designed a conditional encoder capable of effectively encoding LR images, reducing the solution space for model image reconstruction and thereby improving the quality of the reconstructed images. We then employed a normalized flow and multimodal adversarial training, learning from complex multimodal distributions, to model the denoising distribution. Doing so boosts the generative modeling capabilities within a minimal number of sampling steps. Experimental comparisons of our proposed model with existing SISR methods on mainstream datasets demonstrate that our model reconstructs more realistic HR images and achieves better performance on multiple evaluation metrics, providing a fresh perspective for tackling SISR tasks.","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"39 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Foundation models meet visualizations: Challenges and opportunities 基础模型与可视化的结合：挑战与机遇

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-05-02 DOI: 10.1007/s41095-023-0393-x

Weikai Yang, Mengchen Liu, Zheng Wang, Shixia Liu

引用次数: 0

Learning layout generation for virtual worlds 为虚拟世界生成学习布局

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-05-02 DOI: 10.1007/s41095-023-0365-1

Weihao Cheng, Ying Shan

{"title":"Learning layout generation for virtual worlds","authors":"Weihao Cheng, Ying Shan","doi":"10.1007/s41095-023-0365-1","DOIUrl":"https://doi.org/10.1007/s41095-023-0365-1","url":null,"abstract":"The emergence of the metaverse has led to the rapidly increasing demand for the generation of extensive 3D worlds. We consider that an engaging world is built upon a rational layout of multiple land-use areas (e.g., forest, meadow, and farmland). To this end, we propose a generative model of land-use distribution that learns from geographic data. The model is based on a transformer architecture that generates a 2D map of the land-use layout, which can be conditioned on spatial and semantic controls, depending on whether either one or both are provided. This model enables diverse layout generation with user control and layout expansion by extending borders with partial inputs. To generate high-quality and satisfactory layouts, we devise a geometric objective function that supervises the model to perceive layout shapes and regularize generations using geometric priors. Additionally, we devise a planning objective function that supervises the model to perceive progressive composition demands and suppress generations deviating from controls. To evaluate the spatial distribution of the generations, we train an autoencoder to embed land-use layouts into vectors to enable comparison between the real and generated data using the Wasserstein metric, which is inspired by the Fréchet inception distance.","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"13 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AdaPIP: Adaptive picture-in-picture guidance for 360° film watching AdaPIP：自适应画中画引导，360°观影

IF 6.9 3区计算机科学

Computational Visual Media Pub Date : 2024-05-02 DOI: 10.1007/s41095-023-0347-3

Yi-Xiao Li, Guan Luo, Yi-Ke Xu, Yu He, Fang-Lue Zhang, Song-Hai Zhang

引用次数: 0