Chen Wang, Jiatao Gu, Xiaoxiao Long, Yuan Liu, Lingjie Liu
{"title":"GECO: Fast Generative Image-to-3D within one SECOnd.","authors":"Chen Wang, Jiatao Gu, Xiaoxiao Long, Yuan Liu, Lingjie Liu","doi":"10.1109/TVCG.2025.3602405","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advancements in single-image 3D generation have produced two main categories of methods: reconstruction-based and generative methods. Reconstruction-based methods are efficient but lack uncertainty handling, leading to blurry artifacts in unseen regions. Generative approaches that based on score distillation [47], [71] are slow due to scene-specific optimization. Other methods, like InstantMesh [76], use a two-stage process - generating multi-view images with a diffusion model and then reconstructing 3D - which is inefficient due to multiple denoising steps of the diffusion model. To overcome these limitations, we introduce GECO, a feed-forward method for fast and high-quality single-image-to-3D generation within one second on a single GPU. Our approach resolves uncertainty and inefficiency issues through a two-stage distillation process. In the first stage, we distill a multi-step diffusion model [56] into a one-step model using score distillation for single-image-to-multi-view synthesis. To mitigate the synthesis quality degradation caused by the one-step model, we introduce a second distillation stage to learn to predict high-quality 3D from imperfect multi-view generated images by performing distillation directly on 3D representations. Experiments demonstrate that GECO offers significant speed improvements and comparable reconstruction quality compared to prior two-stage methods. Code: https://cwchenwang.github.io/geco.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3602405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advancements in single-image 3D generation have produced two main categories of methods: reconstruction-based and generative methods. Reconstruction-based methods are efficient but lack uncertainty handling, leading to blurry artifacts in unseen regions. Generative approaches that based on score distillation [47], [71] are slow due to scene-specific optimization. Other methods, like InstantMesh [76], use a two-stage process - generating multi-view images with a diffusion model and then reconstructing 3D - which is inefficient due to multiple denoising steps of the diffusion model. To overcome these limitations, we introduce GECO, a feed-forward method for fast and high-quality single-image-to-3D generation within one second on a single GPU. Our approach resolves uncertainty and inefficiency issues through a two-stage distillation process. In the first stage, we distill a multi-step diffusion model [56] into a one-step model using score distillation for single-image-to-multi-view synthesis. To mitigate the synthesis quality degradation caused by the one-step model, we introduce a second distillation stage to learn to predict high-quality 3D from imperfect multi-view generated images by performing distillation directly on 3D representations. Experiments demonstrate that GECO offers significant speed improvements and comparable reconstruction quality compared to prior two-stage methods. Code: https://cwchenwang.github.io/geco.