Computational Visual Media最新文献

筛选
英文 中文
Learning physically based material and lighting decompositions for face editing 为人脸编辑学习基于物理的材质和照明分解
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2024-01-03 DOI: 10.1007/s41095-022-0309-1
Qian Zhang, Vikas Thamizharasan, James Tompkin
{"title":"Learning physically based material and lighting decompositions for face editing","authors":"Qian Zhang, Vikas Thamizharasan, James Tompkin","doi":"10.1007/s41095-022-0309-1","DOIUrl":"https://doi.org/10.1007/s41095-022-0309-1","url":null,"abstract":"<p>Lighting is crucial for portrait photography, yet the complex interactions between the skin and incident light are expensive to model computationally in graphics and difficult to reconstruct analytically via computer vision. Alternatively, to allow fast and controllable reflectance and lighting editing, we developed a physically based decomposition through deep learned priors from path-traced portrait images. Previous approaches that used simplified material models or low-frequency or low-dynamic-range lighting struggled to model specular reflections or relight directly without intermediate decomposition. However, we estimate the surface normal, skin albedo and roughness, and high-frequency HDRI maps, and propose an architecture to estimate both diffuse and specular reflectance components. In our experiments, we show that this approach can represent the true appearance function more effectively than simpler baseline methods, leading to better generalization and higher-quality editing.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
APF-GAN: Exploring asymmetric pre-training and fine-tuning strategy for conditional generative adversarial network APF-GAN:探索条件生成式对抗网络的非对称预训练和微调策略
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0357-1
Yuxuan Li, Lingfeng Yang, Xiang Li
{"title":"APF-GAN: Exploring asymmetric pre-training and fine-tuning strategy for conditional generative adversarial network","authors":"Yuxuan Li, Lingfeng Yang, Xiang Li","doi":"10.1007/s41095-023-0357-1","DOIUrl":"https://doi.org/10.1007/s41095-023-0357-1","url":null,"abstract":"","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified multi-view multi-person tracking framework 统一的多视角多人跟踪框架
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0334-8
Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui, Shan Jiang
{"title":"A unified multi-view multi-person tracking framework","authors":"Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui, Shan Jiang","doi":"10.1007/s41095-023-0334-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0334-8","url":null,"abstract":"<p>Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the <i>Campus</i> and <i>Shelf</i> datasets for 3D pose tracking, with comparable results on the <i>WILDTRACK</i> and <i>MMPTRACK</i> datasets for 3D footprint tracking.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical vectorization for facial images 面部图像的分层矢量化
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-022-0314-4
Qian Fu, Linlin Liu, Fei Hou, Ying He
{"title":"Hierarchical vectorization for facial images","authors":"Qian Fu, Linlin Liu, Fei Hou, Ying He","doi":"10.1007/s41095-022-0314-4","DOIUrl":"https://doi.org/10.1007/s41095-022-0314-4","url":null,"abstract":"<p>The explosive growth of social media means portrait editing and retouching are in high demand. While portraits are commonly captured and stored as raster images, editing raster images is non-trivial and requires the user to be highly skilled. Aiming at developing intuitive and easy-to-use portrait editing tools, we propose a novel vectorization method that can automatically convert raster images into a 3-tier hierarchical representation. The base layer consists of a set of sparse diffusion curves (DCs) which characterize salient geometric features and low-frequency colors, providing a means for semantic color transfer and facial expression editing. The middle level encodes specular highlights and shadows as large, editable Poisson regions (PRs) and allows the user to directly adjust illumination by tuning the strength and changing the shapes of PRs. The top level contains two types of pixel-sized PRs for high-frequency residuals and fine details such as pimples and pigmentation. We train a deep generative model that can produce high-frequency residuals automatically. Thanks to the inherent meaning in vector primitives, editing portraits becomes easy and intuitive. In particular, our method supports color transfer, facial expression editing, highlight and shadow editing, and automatic retouching. To quantitatively evaluate the results, we extend the commonly used FLIP metric (which measures color and feature differences between two images) to consider illumination. The new metric, illumination-sensitive FLIP, can effectively capture salient changes in color transfer results, and is more consistent with human perception than FLIP and other quality measures for portrait images. We evaluate our method on the FFHQR dataset and show it to be effective for common portrait editing tasks, such as retouching, light editing, color transfer, and expression editing.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MusicFace: Music-driven expressive singing face synthesis 音乐脸谱音乐驱动的表情歌唱脸部合成
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0343-7
Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng
{"title":"MusicFace: Music-driven expressive singing face synthesis","authors":"Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng","doi":"10.1007/s41095-023-0343-7","DOIUrl":"https://doi.org/10.1007/s41095-023-0343-7","url":null,"abstract":"<p>It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139079090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D hand pose and shape estimation from monocular RGB via efficient 2D cues 通过高效的二维线索从单目 RGB 进行三维手部姿势和形状估计
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0346-4
Fenghao Zhang, Lin Zhao, Shengling Li, Wanjuan Su, Liman Liu, Wenbing Tao
{"title":"3D hand pose and shape estimation from monocular RGB via efficient 2D cues","authors":"Fenghao Zhang, Lin Zhao, Shengling Li, Wanjuan Su, Liman Liu, Wenbing Tao","doi":"10.1007/s41095-023-0346-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0346-4","url":null,"abstract":"<p>Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A causal convolutional neural network for multi-subject motion modeling and generation 基于因果卷积神经网络的多主体运动建模与生成
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-022-0307-3
Shuaiying Hou, Congyi Wang, Wenlin Zhuang, Yu Chen, Yangang Wang, Hujun Bao, Jinxiang Chai, Weiwei Xu
{"title":"A causal convolutional neural network for multi-subject motion modeling and generation","authors":"Shuaiying Hou, Congyi Wang, Wenlin Zhuang, Yu Chen, Yangang Wang, Hujun Bao, Jinxiang Chai, Weiwei Xu","doi":"10.1007/s41095-022-0307-3","DOIUrl":"https://doi.org/10.1007/s41095-022-0307-3","url":null,"abstract":"<p>Inspired by the success of WaveNet in multi-subject speech synthesis, we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation. The network can capture the intrinsic characteristics of the motion of different subjects, such as the influence of skeleton scale variation on motion style. Moreover, after fine-tuning the network using a small motion dataset for a novel skeleton that is not included in the training dataset, it is able to synthesize high-quality motions with a personalized style for the novel skeleton. The experimental results demonstrate that our network can model the intrinsic characteristics of motions well and can be applied to various motion modeling and synthesis tasks.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A visual modeling method for spatiotemporal and multidimensional features in epidemiological analysis: Applied COVID-19 aggregated datasets 流行病学分析中时空和多维特征的可视化建模方法:应用COVID-19汇总数据集
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0353-5
Yu Dong, Christy Jie Liang, Yi Chen, Jie Hua
{"title":"A visual modeling method for spatiotemporal and multidimensional features in epidemiological analysis: Applied COVID-19 aggregated datasets","authors":"Yu Dong, Christy Jie Liang, Yi Chen, Jie Hua","doi":"10.1007/s41095-023-0353-5","DOIUrl":"https://doi.org/10.1007/s41095-023-0353-5","url":null,"abstract":"<p>The visual modeling method enables flexible interactions with rich graphical depictions of data and supports the exploration of the complexities of epidemiological analysis. However, most epidemiology visualizations do not support the combined analysis of objective factors that might influence the transmission situation, resulting in a lack of quantitative and qualitative evidence. To address this issue, we developed a portrait-based visual modeling method called <i>+msRNAer</i>. This method considers the spatiotemporal features of virus transmission patterns and multidimensional features of objective risk factors in communities, enabling portrait-based exploration and comparison in epidemiological analysis. We applied <i>+msRNAer</i> to aggregate COVID-19-related datasets in New South Wales, Australia, combining COVID-19 case number trends, geo-information, intervention events, and expert-supervised risk factors extracted from local government area-based censuses. We perfected the <i>+msRNAer</i> workflow with collaborative views and evaluated its feasibility, effectiveness, and usefulness through one user study and three subject-driven case studies. Positive feedback from experts indicates that <i>+msRNAer</i> provides a general understanding for analyzing comprehension that not only compares relationships between cases in time-varying and risk factors through portraits but also supports navigation in fundamental geographical, timeline, and other factor comparisons. By adopting interactions, experts discovered functional and practical implications for potential patterns of long-standing community factors regarding the vulnerability faced by the pandemic. Experts confirmed that <i>+msRNAer</i> is expected to deliver visual modeling benefits with spatiotemporal and multidimensional features in other epidemiological analysis scenarios.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features 基于边缘增强点对特征的三维刚体6DOF位姿估计
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-022-0308-2
Chenyi Liu, Fei Chen, Lu Deng, Renjiao Yi, Lintao Zheng, Chenyang Zhu, Jia Wang, Kai Xu
{"title":"6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features","authors":"Chenyi Liu, Fei Chen, Lu Deng, Renjiao Yi, Lintao Zheng, Chenyang Zhu, Jia Wang, Kai Xu","doi":"10.1007/s41095-022-0308-2","DOIUrl":"https://doi.org/10.1007/s41095-022-0308-2","url":null,"abstract":"<p>The point pair feature (PPF) is widely used for 6D pose estimation. In this paper, we propose an efficient 6D pose estimation method based on the PPF framework. We introduce a well-targeted down-sampling strategy that focuses on edge areas for efficient feature extraction for complex geometry. A pose hypothesis validation approach is proposed to resolve ambiguity due to symmetry by calculating the edge matching degree. We perform evaluations on two challenging datasets and one real-world collected dataset, demonstrating the superiority of our method for pose estimation for geometrically complex, occluded, symmetrical objects. We further validate our method by applying it to simulated punctures.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on facial image deblurring 人脸图像去模糊研究进展
IF 6.9 3区 计算机科学
Computational Visual Media Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0336-6
Bingnan Wang, Fanjiang Xu, Quan Zheng
{"title":"A survey on facial image deblurring","authors":"Bingnan Wang, Fanjiang Xu, Quan Zheng","doi":"10.1007/s41095-023-0336-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0336-6","url":null,"abstract":"<p>When a facial image is blurred, it significantly affects high-level vision tasks such as face recognition. The purpose of facial image deblurring is to recover a clear image from a blurry input image, which can improve the recognition accuracy, etc. However, general deblurring methods do not perform well on facial images. Therefore, some face deblurring methods have been proposed to improve performance by adding semantic or structural information as specific priors according to the characteristics of the facial images. In this paper, we survey and summarize recently published methods for facial image deblurring, most of which are based on deep learning. First, we provide a brief introduction to the modeling of image blurring. Next, we summarize face deblurring methods into two categories: model-based methods and deep learning-based methods. Furthermore, we summarize the datasets, loss functions, and performance evaluation metrics commonly used in the neural network training process. We show the performance of classical methods on these datasets and metrics and provide a brief discussion on the differences between model-based and learning-based methods. Finally, we discuss the current challenges and possible future research directions.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138517018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信