{"title":"Multi-scale hash encoding based neural geometry representation","authors":"","doi":"10.1007/s41095-023-0340-x","DOIUrl":"https://doi.org/10.1007/s41095-023-0340-x","url":null,"abstract":"<h3>Abstract</h3> <p>Recently, neural implicit function-based representation has attracted more and more attention, and has been widely used to represent surfaces using differentiable neural networks. However, surface reconstruction from point clouds or multi-view images using existing neural geometry representations still suffer from slow computation and poor accuracy. To alleviate these issues, we propose a multi-scale hash encoding-based neural geometry representation which effectively and efficiently represents the surface as a signed distance field. Our novel neural network structure carefully combines low-frequency Fourier position encoding with multi-scale hash encoding. The initialization of the geometry network and geometry features of the rendering module are accordingly redesigned. Our experiments demonstrate that the proposed representation is at least 10 times faster for reconstructing point clouds with millions of points. It also significantly improves speed and accuracy of multi-view reconstruction. Our code and models are available at https://github.com/Dengzhi-USTC/Neural-Geometry-Reconstruction. <span> <span> <img alt=\"\" src=\"https://static-content.springer.com/image/MediaObjects/41095_2023_340_Fig1_HTML.jpg\"/> </span> </span></p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"8 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140204009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueguang Xie, Yang Gao, Fei Hou, Aimin Hao, Hong Qin
{"title":"Erratum to: Dynamic ocean inverse modeling based on differentiable rendering","authors":"Xueguang Xie, Yang Gao, Fei Hou, Aimin Hao, Hong Qin","doi":"10.1007/s41095-024-0398-z","DOIUrl":"https://doi.org/10.1007/s41095-024-0398-z","url":null,"abstract":"<p>The authors apologize for a hidden error in the article. It is that the images in Figs. 14(a) and 14(d) were mistakenly presented as left–right mirror images. The authors have flipped them to ensure that the figures now correspond correctly with others in the subfigures (b, c, e, f). The accurate version of Fig. 14 is provided as below.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"20 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140204015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuhua Xian, Jiaxin Li, Hao Wu, Zisen Lin, Guiqing Li
{"title":"Delving into high-quality SVBRDF acquisition: A new setup and method","authors":"Chuhua Xian, Jiaxin Li, Hao Wu, Zisen Lin, Guiqing Li","doi":"10.1007/s41095-023-0352-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0352-6","url":null,"abstract":"<p>In this study, we present a new and innovative framework for acquiring high-quality SVBRDF maps. Our approach addresses the limitations of the current methods and proposes a new solution. The core of our method is a simple hardware setup consisting of a consumer-level camera, LED lights, and a carefully designed network that can accurately obtain the high-quality SVBRDF properties of a nearly planar object. By capturing a flexible number of images of an object, our network uses different subnetworks to train different property maps and employs appropriate loss functions for each of them. To further enhance the quality of the maps, we improved the network structure by adding a novel skip connection that connects the encoder and decoder with global features. Through extensive experimentation using both synthetic and real-world materials, our results demonstrate that our method outperforms previous methods and produces superior results. Furthermore, our proposed setup can also be used to acquire physically based rendering maps of special materials.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"4 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network","authors":"","doi":"10.1007/s41095-023-0369-x","DOIUrl":"https://doi.org/10.1007/s41095-023-0369-x","url":null,"abstract":"<h3>Abstract</h3> <p>Recently, facial-expression recognition (FER) has primarily focused on images in the wild, including factors such as face occlusion and image blurring, rather than laboratory images. Complex field environments have introduced new challenges to FER. To address these challenges, this study proposes a cross-fusion dual-attention network. The network comprises three parts: (1) a cross-fusion grouped dual-attention mechanism to refine local features and obtain global information; (2) a proposed <em>C</em><sup>2</sup> activation function construction method, which is a piecewise cubic polynomial with three degrees of freedom, requiring less computation with improved flexibility and recognition abilities, which can better address slow running speeds and neuron inactivation problems; and (3) a closed-loop operation between the self-attention distillation process and residual connections to suppress redundant information and improve the generalization ability of the model. The recognition accuracies on the RAF-DB, FERPlus, and AffectNet datasets were 92.78%, 92.02%, and 63.58%, respectively. Experiments show that this model can provide more effective solutions for FER tasks. <span> <span> <img alt=\"\" src=\"https://static-content.springer.com/image/MediaObjects/41095_2023_369_Fig1_HTML.jpg\"/> </span> </span></p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"17 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-task learning and joint refinement between camera localization and object detection","authors":"Junyi Wang, Yue Qi","doi":"10.1007/s41095-022-0319-z","DOIUrl":"https://doi.org/10.1007/s41095-022-0319-z","url":null,"abstract":"<p>Visual localization and object detection both play important roles in various tasks. In many indoor application scenarios where some detected objects have fixed positions, the two techniques work closely together. However, few researchers consider these two tasks simultaneously, because of a lack of datasets and the little attention paid to such environments. In this paper, we explore multi-task network design and joint refinement of detection and localization. To address the dataset problem, we construct a medium indoor scene of an aviation exhibition hall through a semi-automatic process. The dataset provides localization and detection information, and is publicly available at https://drive.google.com/drive/folders/1U28zkuN4_I0dbzkqyIAKlAl5k9oUK0jI?usp=sharing for benchmarking localization and object detection tasks. Targeting this dataset, we have designed a multi-task network, JLDNet, based on YOLO v3, that outputs a target point cloud and object bounding boxes. For dynamic environments, the detection branch also promotes the perception of dynamics. JLDNet includes image feature learning, point feature learning, feature fusion, detection construction, and point cloud regression. Moreover, object-level bundle adjustment is used to further improve localization and detection accuracy. To test JLDNet and compare it to other methods, we have conducted experiments on 7 static scenes, our constructed dataset, and the dynamic TUM RGB-D and Bonn datasets. Our results show state-of-the-art accuracy for both tasks, and the benefit of jointly working on both tasks is demonstrated.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"4 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DualSmoke: Sketch-based smoke illustration design with two-stage generative model","authors":"Haoran Xie, Keisuke Arihara, Syuhei Sato, Kazunori Miyata","doi":"10.1007/s41095-022-0318-0","DOIUrl":"https://doi.org/10.1007/s41095-022-0318-0","url":null,"abstract":"<p>The dynamic effects of smoke are impressive in illustration design, but it is a troublesome and challenging issue for inexpert users to design smoke effects without domain knowledge of fluid simulations. In this work, we propose DualSmoke, a two-stage global-to-local generation framework for interactive smoke illustration design. In the global stage, the proposed approach utilizes fluid patterns to generate Lagrangian coherent structures from the user’s hand-drawn sketches. In the local stage, detailed flow patterns are obtained from the generated coherent structure. Finally, we apply a guiding force field to the smoke simulator to produce the desired smoke illustration. To construct the training dataset, DualSmoke generates flow patterns using finite-time Lyapunov exponents of the velocity fields. The synthetic sketch data are generated from the flow patterns by skeleton extraction. Our user study verifies that the proposed design interface can provide various smoke illustration designs with good user usability. Our code is available at https://githubcom/shasph/DualSmoke.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"10 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Pintore, Eva Almansa, Armando Sanchez, Giorgio Vassena, Enrico Gobbetti
{"title":"Deep panoramic depth prediction and completion for indoor scenes","authors":"Giovanni Pintore, Eva Almansa, Armando Sanchez, Giorgio Vassena, Enrico Gobbetti","doi":"10.1007/s41095-023-0358-0","DOIUrl":"https://doi.org/10.1007/s41095-023-0358-0","url":null,"abstract":"<p>We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"3 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baiqiang Leng, Jingwei Huang, Guanlin Shen, Bin Wang
{"title":"Shape embedding and retrieval in multi-flow deformation","authors":"Baiqiang Leng, Jingwei Huang, Guanlin Shen, Bin Wang","doi":"10.1007/s41095-022-0315-3","DOIUrl":"https://doi.org/10.1007/s41095-022-0315-3","url":null,"abstract":"<p>We propose a unified 3D flow framework for joint learning of shape embedding and deformation for different categories. Our goal is to recover shapes from imperfect point clouds by fitting the best shape template in a shape repository after deformation. Accordingly, we learn a shape embedding for template retrieval and a flow-based network for robust deformation. We note that the deformation flow can be quite different for different shape categories. Therefore, we introduce a novel multi-hub module to learn multiple modes of deformation to incorporate such variation, providing a network which can handle a wide range of objects from different categories. The shape embedding is designed to retrieve the best-fit template as the nearest neighbor in a latent space. We replace the standard fully connected layer with a tiny structure in the embedding that significantly reduces network complexity and further improves deformation quality. Experiments show the superiority of our method to existing state-of-the-art methods via qualitative and quantitative comparisons. Finally, our method provides efficient and flexible deformation that can further be used for novel shape design.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"45 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueguang Xie, Yang Gao, Fei Hou, Aimin Hao, Hong Qin
{"title":"Dynamic ocean inverse modeling based on differentiable rendering","authors":"Xueguang Xie, Yang Gao, Fei Hou, Aimin Hao, Hong Qin","doi":"10.1007/s41095-023-0338-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0338-4","url":null,"abstract":"<p>Learning and inferring underlying motion patterns of captured 2D scenes and then re-creating dynamic evolution consistent with the real-world natural phenomena have high appeal for graphics and animation. To bridge the technical gap between virtual and real environments, we focus on the inverse modeling and reconstruction of visually consistent and property-verifiable oceans, taking advantage of deep learning and differentiable physics to learn geometry and constitute waves in a self-supervised manner. First, we infer hierarchical geometry using two networks, which are optimized via the differentiable renderer. We extract wave components from the sequence of inferred geometry through a network equipped with a differentiable ocean model. Then, ocean dynamics can be evolved using the reconstructed wave components. Through extensive experiments, we verify that our new method yields satisfactory results for both geometry reconstruction and wave estimation. Moreover, the new framework has the inverse modeling potential to facilitate a host of graphics applications, such as the rapid production of physically accurate scene animation and editing guided by real ocean scenes.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"72 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmarking visual SLAM methods in mirror environments","authors":"Peter Herbert, Jing Wu, Ze Ji, Yu-Kun Lai","doi":"10.1007/s41095-022-0329-x","DOIUrl":"https://doi.org/10.1007/s41095-022-0329-x","url":null,"abstract":"<p>Visual simultaneous localisation and mapping (vSLAM) finds applications for indoor and outdoor navigation that routinely subjects it to visual complexities, particularly mirror reflections. The effect of mirror presence (time visible and its average size in the frame) was hypothesised to impact localisation and mapping performance, with systems using direct techniques expected to perform worse. Thus, a dataset, MirrEnv, of image sequences recorded in mirror environments, was collected, and used to evaluate the performance of existing representative methods. RGBD ORB-SLAM3 and BundleFusion appear to show moderate degradation of absolute trajectory error with increasing mirror duration, whilst the remaining results did not show significantly degraded localisation performance. The mesh maps generated proved to be very inaccurate, with real and virtual reflections colliding in the reconstructions. A discussion is given of the likely sources of error and robustness in mirror environments, outlining future directions for validating and improving vSLAM performance in the presence of planar mirrors. The MirrEnv dataset is available at https://doi.org/10.17035/d.2023.0292477898.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"39 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}