Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, Otmar Hilliges
{"title":"Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video","authors":"Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, Otmar Hilliges","doi":"arxiv-2409.08189","DOIUrl":"https://doi.org/arxiv-2409.08189","url":null,"abstract":"We introduce Gaussian Garments, a novel approach for reconstructing realistic\u0000simulation-ready garment assets from multi-view videos. Our method represents\u0000garments with a combination of a 3D mesh and a Gaussian texture that encodes\u0000both the color and high-frequency surface details. This representation enables\u0000accurate registration of garment geometries to multi-view videos and helps\u0000disentangle albedo textures from lighting effects. Furthermore, we demonstrate\u0000how a pre-trained graph neural network (GNN) can be fine-tuned to replicate the\u0000real behavior of each garment. The reconstructed Gaussian Garments can be\u0000automatically combined into multi-garment outfits and animated with the\u0000fine-tuned GNN.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos","authors":"Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan","doi":"arxiv-2409.07447","DOIUrl":"https://doi.org/arxiv-2409.07447","url":null,"abstract":"This paper presents a novel framework for converting 2D videos to immersive\u0000stereoscopic 3D, addressing the growing demand for 3D content in immersive\u0000experience. Leveraging foundation models as priors, our approach overcomes the\u0000limitations of traditional methods and boosts the performance to ensure the\u0000high-fidelity generation required by the display devices. The proposed system\u0000consists of two main steps: depth-based video splatting for warping and\u0000extracting occlusion mask, and stereo video inpainting. We utilize pre-trained\u0000stable video diffusion as the backbone and introduce a fine-tuning protocol for\u0000the stereo video inpainting task. To handle input video with varying lengths\u0000and resolutions, we explore auto-regressive strategies and tiled processing.\u0000Finally, a sophisticated data processing pipeline has been developed to\u0000reconstruct a large-scale and high-quality dataset to support our training. Our\u0000framework demonstrates significant improvements in 2D-to-3D video conversion,\u0000offering a practical solution for creating immersive content for 3D devices\u0000like Apple Vision Pro and 3D displays. In summary, this work contributes to the\u0000field by presenting an effective method for generating high-quality\u0000stereoscopic videos from monocular input, potentially transforming how we\u0000experience digital media.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura
{"title":"Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering","authors":"Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura","doi":"arxiv-2409.07441","DOIUrl":"https://doi.org/arxiv-2409.07441","url":null,"abstract":"We propose GauFace, a novel Gaussian Splatting representation, tailored for\u0000efficient animation and rendering of physically-based facial assets. Leveraging\u0000strong geometric priors and constrained optimization, GauFace ensures a neat\u0000and structured Gaussian representation, delivering high fidelity and real-time\u0000facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translates\u0000physically-based facial assets into the corresponding GauFace representations.\u0000Specifically, we adopt a patch-based pipeline to handle the vast number of\u0000Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme\u0000with UV positional encoding to ensure the throughput and rendering quality of\u0000GauFace assets generated by our TransGS. Once trained, TransGS can instantly\u0000translate facial assets with lighting conditions to GauFace representation,\u0000With the rich conditioning modalities, it also enables editing and animation\u0000capabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditional\u0000offline and online renderers, as well as recent neural rendering methods, which\u0000demonstrate the superior performance of our approach for facial asset\u0000rendering. We also showcase diverse immersive applications of facial assets\u0000using our TransGS approach and GauFace representation, across various platforms\u0000like PCs, phones and even VR headsets.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera
{"title":"MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification","authors":"Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera","doi":"arxiv-2409.06620","DOIUrl":"https://doi.org/arxiv-2409.06620","url":null,"abstract":"The field of text-to-3D content generation has made significant progress in\u0000generating realistic 3D objects, with existing methodologies like Score\u0000Distillation Sampling (SDS) offering promising guidance. However, these methods\u0000often encounter the \"Janus\" problem-multi-face ambiguities due to imprecise\u0000guidance. Additionally, while recent advancements in 3D gaussian splitting have\u0000shown its efficacy in representing 3D volumes, optimization of this\u0000representation remains largely unexplored. This paper introduces a unified\u0000framework for text-to-3D content generation that addresses these critical gaps.\u0000Our approach utilizes multi-view guidance to iteratively form the structure of\u0000the 3D model, progressively enhancing detail and accuracy. We also introduce a\u0000novel densification algorithm that aligns gaussians close to the surface,\u0000optimizing the structural integrity and fidelity of the generated models.\u0000Extensive experiments validate our approach, demonstrating that it produces\u0000high-quality visual outputs with minimal time cost. Notably, our method\u0000achieves high-quality results within half an hour of training, offering a\u0000substantial efficiency gain over most existing methods, which require hours of\u0000training time to achieve comparable results.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zixuan Li, Pengfei Shen, Hanxiao Sun, Zibo Zhang, Yu Guo, Ligang Liu, Ling-Qi Yan, Steve Marschner, Milos Hasan, Beibei Wang
{"title":"Fiber-level Woven Fabric Capture from a Single Photo","authors":"Zixuan Li, Pengfei Shen, Hanxiao Sun, Zibo Zhang, Yu Guo, Ligang Liu, Ling-Qi Yan, Steve Marschner, Milos Hasan, Beibei Wang","doi":"arxiv-2409.06368","DOIUrl":"https://doi.org/arxiv-2409.06368","url":null,"abstract":"Accurately rendering the appearance of fabrics is challenging, due to their\u0000complex 3D microstructures and specialized optical properties. If we model the\u0000geometry and optics of fabrics down to the fiber level, we can achieve\u0000unprecedented rendering realism, but this raises the difficulty of authoring or\u0000capturing the fiber-level assets. Existing approaches can obtain fiber-level\u0000geometry with special devices (e.g., CT) or complex hand-designed procedural\u0000pipelines (manually tweaking a set of parameters). In this paper, we propose a\u0000unified framework to capture fiber-level geometry and appearance of woven\u0000fabrics using a single low-cost microscope image. We first use a simple neural\u0000network to predict initial parameters of our geometric and appearance models.\u0000From this starting point, we further optimize the parameters of procedural\u0000fiber geometry and an approximated shading model via differentiable\u0000rasterization to match the microscope photo more accurately. Finally, we refine\u0000the fiber appearance parameters via differentiable path tracing, converging to\u0000accurate fiber optical parameters, which are suitable for physically-based\u0000light simulations to produce high-quality rendered results. We believe that our\u0000method is the first to utilize differentiable rendering at the microscopic\u0000level, supporting physically-based scattering from explicit fiber assemblies.\u0000Our fabric parameter estimation achieves high-quality re-rendering of measured\u0000woven fabric samples in both distant and close-up views. These results can\u0000further be used for efficient rendering or converted to downstream\u0000representations. We also propose a patch-space fiber geometry procedural\u0000generation and a two-scale path tracing framework for efficient rendering of\u0000fabric scenes.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Vectorization with Depth: convexified shape layers with depth ordering","authors":"Ho Law, Sung Ha Kang","doi":"arxiv-2409.06648","DOIUrl":"https://doi.org/arxiv-2409.06648","url":null,"abstract":"Image vectorization is a process to convert a raster image into a scalable\u0000vector graphic format. Objective is to effectively remove the pixelization\u0000effect while representing boundaries of image by scaleable parameterized\u0000curves. We propose new image vectorization with depth which considers depth\u0000ordering among shapes and use curvature-based inpainting for convexifying\u0000shapes in vectorization process.From a given color quantized raster image, we\u0000first define each connected component of the same color as a shape layer, and\u0000construct depth ordering among them using a newly proposed depth ordering\u0000energy. Global depth ordering among all shapes is described by a directed\u0000graph, and we propose an energy to remove cycle within the graph. After\u0000constructing depth ordering of shapes, we convexify occluded regions by Euler's\u0000elastica curvature-based variational inpainting, and leverage on the stability\u0000of Modica-Mortola double-well potential energy to inpaint large regions. This\u0000is following human vision perception that boundaries of shapes extend smoothly,\u0000and we assume shapes are likely to be convex. Finally, we fit B'{e}zier curves\u0000to the boundaries and save vectorization as a SVG file which allows\u0000superposition of curvature-based inpainted shapes following the depth ordering.\u0000This is a new way to vectorize images, by decomposing an image into scalable\u0000shape layers with computed depth ordering. This approach makes editing shapes\u0000and images more natural and intuitive. We also consider grouping shape layers\u0000for semantic vectorization. We present various numerical results and\u0000comparisons against recent layer-based vectorization methods to validate the\u0000proposed model.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Pang, Zhongtian Zheng, Yilong Li, Guoping Wang, Peng-Shuai Wang
{"title":"Neural Laplacian Operator for 3D Point Clouds","authors":"Bo Pang, Zhongtian Zheng, Yilong Li, Guoping Wang, Peng-Shuai Wang","doi":"arxiv-2409.06506","DOIUrl":"https://doi.org/arxiv-2409.06506","url":null,"abstract":"The discrete Laplacian operator holds a crucial role in 3D geometry\u0000processing, yet it is still challenging to define it on point clouds. Previous\u0000works mainly focused on constructing a local triangulation around each point to\u0000approximate the underlying manifold for defining the Laplacian operator, which\u0000may not be robust or accurate. In contrast, we simply use the K-nearest\u0000neighbors (KNN) graph constructed from the input point cloud and learn the\u0000Laplacian operator on the KNN graph with graph neural networks (GNNs). However,\u0000the ground-truth Laplacian operator is defined on a manifold mesh with a\u0000different connectivity from the KNN graph and thus cannot be directly used for\u0000training. To train the GNN, we propose a novel training scheme by imitating the\u0000behavior of the ground-truth Laplacian operator on a set of probe functions so\u0000that the learned Laplacian operator behaves similarly to the ground-truth\u0000Laplacian operator. We train our network on a subset of ShapeNet and evaluate\u0000it across a variety of point clouds. Compared with previous methods, our method\u0000reduces the error by an order of magnitude and excels in handling sparse point\u0000clouds with thin structures or sharp features. Our method also demonstrates a\u0000strong generalization ability to unseen shapes. With our learned Laplacian\u0000operator, we further apply a series of Laplacian-based geometry processing\u0000algorithms directly to point clouds and achieve accurate results, enabling many\u0000exciting possibilities for geometry processing on point clouds. The code and\u0000trained models are available at https://github.com/IntelligentGeometry/NeLo.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qimin Chen, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri
{"title":"DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement","authors":"Qimin Chen, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri","doi":"arxiv-2409.06129","DOIUrl":"https://doi.org/arxiv-2409.06129","url":null,"abstract":"We present a 3D modeling method which enables end-users to refine or\u0000detailize 3D shapes using machine learning, expanding the capabilities of\u0000AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced\u0000with a simple box extrusion tool or via generative modeling), a user can\u0000directly \"paint\" desired target styles representing compelling geometric\u0000details, from input exemplar shapes, over different regions of the coarse\u0000shape. These regions are then up-sampled into high-resolution geometries which\u0000adhere with the painted styles. To achieve such controllable and localized 3D\u0000detailization, we build on top of a Pyramid GAN by making it masking-aware. We\u0000devise novel structural losses and priors to ensure that our method preserves\u0000both desired coarse structures and fine-grained features even if the painted\u0000styles are borrowed from diverse sources, e.g., different semantic parts and\u0000even different shape categories. Through extensive experiments, we show that\u0000our ability to localize details enables novel interactive creative workflows\u0000and applications. Our experiments further demonstrate that in comparison to\u0000prior techniques built on global detailization, our method generates\u0000structure-preserving, high-resolution stylized geometries with more coherent\u0000shape details and style transitions.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farhan Rasheed, Abrar Naseer, Emma Nilsson, Talha Bin Masood, Ingrid Hotz
{"title":"Multi-scale Cycle Tracking in Dynamic Planar Graphs","authors":"Farhan Rasheed, Abrar Naseer, Emma Nilsson, Talha Bin Masood, Ingrid Hotz","doi":"arxiv-2409.06476","DOIUrl":"https://doi.org/arxiv-2409.06476","url":null,"abstract":"This paper presents a nested tracking framework for analyzing cycles in 2D\u0000force networks within granular materials. These materials are composed of\u0000interacting particles, whose interactions are described by a force network.\u0000Understanding the cycles within these networks at various scales and their\u0000evolution under external loads is crucial, as they significantly contribute to\u0000the mechanical and kinematic properties of the system. Our approach involves\u0000computing a cycle hierarchy by partitioning the 2D domain into segments bounded\u0000by cycles in the force network. We can adapt concepts from nested tracking\u0000graphs originally developed for merge trees by leveraging the duality between\u0000this partitioning and the cycles. We demonstrate the effectiveness of our\u0000method on two force networks derived from experiments with photoelastic disks.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Longhao Zhang, Shuang Liang, Zhipeng Ge, Tianshu Hu
{"title":"PersonaTalk: Bring Attention to Your Persona in Visual Dubbing","authors":"Longhao Zhang, Shuang Liang, Zhipeng Ge, Tianshu Hu","doi":"arxiv-2409.05379","DOIUrl":"https://doi.org/arxiv-2409.05379","url":null,"abstract":"For audio-driven visual dubbing, it remains a considerable challenge to\u0000uphold and highlight speaker's persona while synthesizing accurate lip\u0000synchronization. Existing methods fall short of capturing speaker's unique\u0000speaking style or preserving facial details. In this paper, we present\u0000PersonaTalk, an attention-based two-stage framework, including geometry\u0000construction and face rendering, for high-fidelity and personalized visual\u0000dubbing. In the first stage, we propose a style-aware audio encoding module\u0000that injects speaking style into audio features through a cross-attention\u0000layer. The stylized audio features are then used to drive speaker's template\u0000geometry to obtain lip-synced geometries. In the second stage, a dual-attention\u0000face renderer is introduced to render textures for the target geometries. It\u0000consists of two parallel cross-attention layers, namely Lip-Attention and\u0000Face-Attention, which respectively sample textures from different reference\u0000frames to render the entire face. With our innovative design, intricate facial\u0000details can be well preserved. Comprehensive experiments and user studies\u0000demonstrate our advantages over other state-of-the-art methods in terms of\u0000visual quality, lip-sync accuracy and persona preservation. Furthermore, as a\u0000person-generic framework, PersonaTalk can achieve competitive performance as\u0000state-of-the-art person-specific methods. Project Page:\u0000https://grisoon.github.io/PersonaTalk/.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}