{"title":"Sampling for View Synthesis: From Local Light Field Fusion to Neural Radiance Fields and Beyond","authors":"Ravi Ramamoorthi","doi":"arxiv-2408.04586","DOIUrl":"https://doi.org/arxiv-2408.04586","url":null,"abstract":"Capturing and rendering novel views of complex real-world scenes is a\u0000long-standing problem in computer graphics and vision, with applications in\u0000augmented and virtual reality, immersive experiences and 3D photography. The\u0000advent of deep learning has enabled revolutionary advances in this area,\u0000classically known as image-based rendering. However, previous approaches\u0000require intractably dense view sampling or provide little or no guidance for\u0000how users should sample views of a scene to reliably render high-quality novel\u0000views. Local light field fusion proposes an algorithm for practical view\u0000synthesis from an irregular grid of sampled views that first expands each\u0000sampled view into a local light field via a multiplane image scene\u0000representation, then renders novel views by blending adjacent local light\u0000fields. Crucially, we extend traditional plenoptic sampling theory to derive a\u0000bound that specifies precisely how densely users should sample views of a given\u0000scene when using our algorithm. We achieve the perceptual quality of Nyquist\u0000rate view sampling while using up to 4000x fewer views. Subsequent developments\u0000have led to new scene representations for deep learning with view synthesis,\u0000notably neural radiance fields, but the problem of sparse view synthesis from a\u0000small number of images has only grown in importance. We reprise some of the\u0000recent results on sparse and even single image view synthesis, while posing the\u0000question of whether prescriptive sampling guidelines are feasible for the new\u0000generation of image-based rendering algorithms.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li
{"title":"Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches","authors":"Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li","doi":"arxiv-2408.04567","DOIUrl":"https://doi.org/arxiv-2408.04567","url":null,"abstract":"3D Content Generation is at the heart of many computer graphics applications,\u0000including video gaming, film-making, virtual and augmented reality, etc. This\u0000paper proposes a novel deep-learning based approach for automatically\u0000generating interactive and playable 3D game scenes, all from the user's casual\u0000prompts such as a hand-drawn sketch. Sketch-based input offers a natural, and\u0000convenient way to convey the user's design intention in the content creation\u0000process. To circumvent the data-deficient challenge in learning (i.e. the lack\u0000of large training data of 3D scenes), our method leverages a pre-trained 2D\u0000denoising diffusion model to generate a 2D image of the scene as the conceptual\u0000guidance. In this process, we adopt the isometric projection mode to factor out\u0000unknown camera poses while obtaining the scene layout. From the generated\u0000isometric image, we use a pre-trained image understanding method to segment the\u0000image into meaningful parts, such as off-ground objects, trees, and buildings,\u0000and extract the 2D scene layout. These segments and layouts are subsequently\u0000fed into a procedural content generation (PCG) engine, such as a 3D video game\u0000engine like Unity or Unreal, to create the 3D scene. The resulting 3D scene can\u0000be seamlessly integrated into a game development environment and is readily\u0000playable. Extensive tests demonstrate that our method can efficiently generate\u0000high-quality and interactive 3D game scenes with layouts that closely follow\u0000the user's intention.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"One-Shot Method for Computing Generalized Winding Numbers","authors":"Cedric Martens, Mikhail Bessmeltsev","doi":"arxiv-2408.04466","DOIUrl":"https://doi.org/arxiv-2408.04466","url":null,"abstract":"The generalized winding number is an essential part of the geometry\u0000processing toolkit, allowing to quantify how much a given point is inside a\u0000surface, often represented by a mesh or a point cloud, even when the surface is\u0000open, noisy, or non-manifold. Parameterized surfaces, which often contain\u0000intentional and unintentional gaps and imprecisions, would also benefit from a\u0000generalized winding number. Standard methods to compute it, however, rely on a\u0000surface integral, challenging to compute without surface discretization,\u0000leading to loss of precision characteristic of parametric surfaces. We propose an alternative method to compute a generalized winding number,\u0000based only on the surface boundary and the intersections of a single ray with\u0000the surface. For parametric surfaces, we show that all the necessary operations\u0000can be done via a Sum-of-Squares (SOS) formulation, thus computing generalized\u0000winding numbers without surface discretization with machine precision. We show\u0000that by discretizing only the boundary of the surface, this becomes an\u0000efficient method. We demonstrate an application of our method to the problem of computing a\u0000generalized winding number of a surface represented by a curve network, where\u0000each curve loop is surfaced via Laplace equation. We use the Boundary Element\u0000Method to express the solution as a parametric surface, allowing us to apply\u0000our method without meshing the surfaces. As a bonus, we also demonstrate that\u0000for meshes with many triangles and a simple boundary, our method is faster than\u0000the hierarchical evaluation of the generalized winding number while still being\u0000precise. We validate our algorithms theoretically, numerically, and by demonstrating a\u0000gallery of results new{on a variety of parametric surfaces and meshes}, as\u0000well uses in a variety of applications, including voxelizations and boolean\u0000operations.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongcheng Song, Dmitry Kachkovski, Shaimaa Monem, Abraham Kassauhun Negash, David I. W. Levin
{"title":"Automatic Skinning using the Mixed Finite Element Method","authors":"Hongcheng Song, Dmitry Kachkovski, Shaimaa Monem, Abraham Kassauhun Negash, David I. W. Levin","doi":"arxiv-2408.04066","DOIUrl":"https://doi.org/arxiv-2408.04066","url":null,"abstract":"In this work, we show that exploiting additional variables in a mixed finite\u0000element formulation of deformation leads to an efficient physics-based\u0000character skinning algorithm. Taking as input, a user-defined rig, we show how\u0000to efficiently compute deformations of the character mesh which respect\u0000artist-supplied handle positions and orientations, but without requiring\u0000complicated constraints on the physics solver, which can cause poor\u0000performance. Rather we demonstrate an efficient, user controllable skinning\u0000pipeline that can generate compelling character deformations, using a variety\u0000of physics material models.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Sprite Decomposition from Animated Graphics","authors":"Tomoyuki Suzuki, Kotaro Kikuchi, Kota Yamaguchi","doi":"arxiv-2408.03923","DOIUrl":"https://doi.org/arxiv-2408.03923","url":null,"abstract":"This paper presents an approach to decomposing animated graphics into\u0000sprites, a set of basic elements or layers. Our approach builds on the\u0000optimization of sprite parameters to fit the raster video. For efficiency, we\u0000assume static textures for sprites to reduce the search space while preventing\u0000artifacts using a texture prior model. To further speed up the optimization, we\u0000introduce the initialization of the sprite parameters utilizing a pre-trained\u0000video object segmentation model and user input of single frame annotations. For\u0000our study, we construct the Crello Animation dataset from an online design\u0000service and define quantitative metrics to measure the quality of the extracted\u0000sprites. Experiments show that our method significantly outperforms baselines\u0000for similar decomposition tasks in terms of the quality/efficiency tradeoff.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis","authors":"Hugo Blanc, Jean-Emmanuel Deschaud, Alexis Paljic","doi":"arxiv-2408.03356","DOIUrl":"https://doi.org/arxiv-2408.03356","url":null,"abstract":"Differentiable volumetric rendering-based methods made significant progress\u0000in novel view synthesis. On one hand, innovative methods have replaced the\u0000Neural Radiance Fields (NeRF) network with locally parameterized structures,\u0000enabling high-quality renderings in a reasonable time. On the other hand,\u0000approaches have used differentiable splatting instead of NeRF's ray casting to\u0000optimize radiance fields rapidly using Gaussian kernels, allowing for fine\u0000adaptation to the scene. However, differentiable ray casting of irregularly\u0000spaced kernels has been scarcely explored, while splatting, despite enabling\u0000fast rendering times, is susceptible to clearly visible artifacts. Our work closes this gap by providing a physically consistent formulation of\u0000the emitted radiance c and density {sigma}, decomposed with Gaussian functions\u0000associated with Spherical Gaussians/Harmonics for all-frequency colorimetric\u0000representation. We also introduce a method enabling differentiable ray casting\u0000of irregularly distributed Gaussians using an algorithm that integrates\u0000radiance fields slab by slab and leverages a BVH structure. This allows our\u0000approach to finely adapt to the scene while avoiding splatting artifacts. As a\u0000result, we achieve superior rendering quality compared to the state-of-the-art\u0000while maintaining reasonable training times and achieving inference speeds of\u000025 FPS on the Blender dataset. Project page with videos and code:\u0000https://raygauss.github.io/","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion","authors":"Xingguang Yan, Han-Hung Lee, Ziyu Wan, Angel X. Chang","doi":"arxiv-2408.03178","DOIUrl":"https://doi.org/arxiv-2408.03178","url":null,"abstract":"We introduce a new approach for generating realistic 3D models with UV maps\u0000through a representation termed \"Object Images.\" This approach encapsulates\u0000surface geometry, appearance, and patch structures within a 64x64 pixel image,\u0000effectively converting complex 3D shapes into a more manageable 2D format. By\u0000doing so, we address the challenges of both geometric and semantic irregularity\u0000inherent in polygonal meshes. This method allows us to use image generation\u0000models, such as Diffusion Transformers, directly for 3D shape generation.\u0000Evaluated on the ABO dataset, our generated shapes with patch structures\u0000achieve point cloud FID comparable to recent 3D generative models, while\u0000naturally supporting PBR material generation.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tengfei Wang, Zongqian Zhan, Rui Xia, Linxia Ji, Xin Wang
{"title":"MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images","authors":"Tengfei Wang, Zongqian Zhan, Rui Xia, Linxia Ji, Xin Wang","doi":"arxiv-2408.03060","DOIUrl":"https://doi.org/arxiv-2408.03060","url":null,"abstract":"Over the last few decades, image-based building surface reconstruction has\u0000garnered substantial research interest and has been applied across various\u0000fields, such as heritage preservation, architectural planning, etc. Compared to\u0000the traditional photogrammetric and NeRF-based solutions, recently, Gaussian\u0000fields-based methods have exhibited significant potential in generating surface\u0000meshes due to their time-efficient training and detailed 3D information\u0000preservation. However, most gaussian fields-based methods are trained with all\u0000image pixels, encompassing building and nonbuilding areas, which results in a\u0000significant noise for building meshes and degeneration in time efficiency. This\u0000paper proposes a novel framework, Masked Gaussian Fields (MGFs), designed to\u0000generate accurate surface reconstruction for building in a time-efficient way.\u0000The framework first applies EfficientSAM and COLMAP to generate multi-level\u0000masks of building and the corresponding masked point clouds. Subsequently, the\u0000masked gaussian fields are trained by integrating two innovative losses: a\u0000multi-level perceptual masked loss focused on constructing building regions and\u0000a boundary loss aimed at enhancing the details of the boundaries between\u0000different masks. Finally, we improve the tetrahedral surface mesh extraction\u0000method based on the masked gaussian spheres. Comprehensive experiments on UAV\u0000images demonstrate that, compared to the traditional method and several\u0000NeRF-based and Gaussian-based SOTA solutions, our approach significantly\u0000improves both the accuracy and efficiency of building surface reconstruction.\u0000Notably, as a byproduct, there is an additional gain in the novel view\u0000synthesis of building.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitris Angelis, Prodromos Kolyvakis, Manos Kamarianakis, George Papagiannakis
{"title":"Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes","authors":"Dimitris Angelis, Prodromos Kolyvakis, Manos Kamarianakis, George Papagiannakis","doi":"arxiv-2408.02275","DOIUrl":"https://doi.org/arxiv-2408.02275","url":null,"abstract":"This paper introduces a novel integration of Large Language Models (LLMs)\u0000with Conformal Geometric Algebra (CGA) to revolutionize controllable 3D scene\u0000editing, particularly for object repositioning tasks, which traditionally\u0000requires intricate manual processes and specialized expertise. These\u0000conventional methods typically suffer from reliance on large training datasets\u0000or lack a formalized language for precise edits. Utilizing CGA as a robust\u0000formal language, our system, shenlong, precisely models spatial transformations\u0000necessary for accurate object repositioning. Leveraging the zero-shot learning\u0000capabilities of pre-trained LLMs, shenlong translates natural language\u0000instructions into CGA operations which are then applied to the scene,\u0000facilitating exact spatial transformations within 3D scenes without the need\u0000for specialized pre-training. Implemented in a realistic simulation\u0000environment, shenlong ensures compatibility with existing graphics pipelines.\u0000To accurately assess the impact of CGA, we benchmark against robust Euclidean\u0000Space baselines, evaluating both latency and accuracy. Comparative performance\u0000evaluations indicate that shenlong significantly reduces LLM response times by\u000016% and boosts success rates by 9.6% on average compared to the traditional\u0000methods. Notably, shenlong achieves a 100% perfect success rate in common\u0000practical queries, a benchmark where other systems fall short. These\u0000advancements underscore shenlong's potential to democratize 3D scene editing,\u0000enhancing accessibility and fostering innovation across sectors such as\u0000education, digital entertainment, and virtual reality.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"100 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva
{"title":"SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements","authors":"Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva","doi":"arxiv-2408.02211","DOIUrl":"https://doi.org/arxiv-2408.02211","url":null,"abstract":"Despite advances in text-to-3D generation methods, generation of multi-object\u0000arrangements remains challenging. Current methods exhibit failures in\u0000generating physically plausible arrangements that respect the provided text\u0000description. We present SceneMotifCoder (SMC), an example-driven framework for\u0000generating 3D object arrangements through visual program learning. SMC\u0000leverages large language models (LLMs) and program synthesis to overcome these\u0000challenges by learning visual programs from example arrangements. These\u0000programs are generalized into compact, editable meta-programs. When combined\u0000with 3D object retrieval and geometry-aware optimization, they can be used to\u0000create object arrangements varying in arrangement structure and contained\u0000objects. Our experiments show that SMC generates high-quality arrangements\u0000using meta-programs learned from few examples. Evaluation results demonstrates\u0000that object arrangements generated by SMC better conform to user-specified text\u0000descriptions and are more physically plausible when compared with\u0000state-of-the-art text-to-3D generation and layout methods.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}