{"title":"HomuGAN: A 3D-Aware GAN With the Method of Cylindrical Spatial-Constrained Sampling","authors":"Haochen Yu;Weixi Gong;Jiansheng Chen;Huimin Ma","doi":"10.1109/TIP.2024.3520423","DOIUrl":null,"url":null,"abstract":"Controllable 3D-aware scene synthesis seeks to disentangle the various latent codes in the implicit space enabling the generation network to create highly realistic images with 3D consistency. Recent approaches often integrate Neural Radiance Fields with the upsampling method of StyleGAN2, employing Convolutions with style modulation to transform spatial coordinates into frequency domain representations. Our analysis indicates that this approach can give rise to a bubble phenomenon in StyleNeRF. We argue that the style modulation introduces extraneous information into the implicit space, disrupting 3D implicit modeling and degrading image quality. We introduce HomuGAN, incorporating two key improvements. First, we disentangle the style modulation applied to implicit modeling from that utilized for super-resolution, thus alleviating the bubble phenomenon. Second, we introduce Cylindrical Spatial-Constrained Sampling and Parabolic Sampling. The latter sampling method, as an alternative method to the former, specifically contributes to the performance of foreground modeling of vehicles. We evaluate HomuGAN on publicly available datasets, comparing its performance to existing methods. Empirical results demonstrate that our model achieves the best performance, exhibiting relatively outstanding disentanglement capability. Moreover, HomuGAN addresses the training instability problem observed in StyleNeRF and reduces the bubble phenomenon.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"320-334"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10816358/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Controllable 3D-aware scene synthesis seeks to disentangle the various latent codes in the implicit space enabling the generation network to create highly realistic images with 3D consistency. Recent approaches often integrate Neural Radiance Fields with the upsampling method of StyleGAN2, employing Convolutions with style modulation to transform spatial coordinates into frequency domain representations. Our analysis indicates that this approach can give rise to a bubble phenomenon in StyleNeRF. We argue that the style modulation introduces extraneous information into the implicit space, disrupting 3D implicit modeling and degrading image quality. We introduce HomuGAN, incorporating two key improvements. First, we disentangle the style modulation applied to implicit modeling from that utilized for super-resolution, thus alleviating the bubble phenomenon. Second, we introduce Cylindrical Spatial-Constrained Sampling and Parabolic Sampling. The latter sampling method, as an alternative method to the former, specifically contributes to the performance of foreground modeling of vehicles. We evaluate HomuGAN on publicly available datasets, comparing its performance to existing methods. Empirical results demonstrate that our model achieves the best performance, exhibiting relatively outstanding disentanglement capability. Moreover, HomuGAN addresses the training instability problem observed in StyleNeRF and reduces the bubble phenomenon.