{"title":"Multi‐tiling neural radiance field (NeRF)—geometric assessment on large‐scale aerial datasets","authors":"Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino","doi":"10.1111/phor.12498","DOIUrl":null,"url":null,"abstract":"Neural radiance fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well‐documented for large‐scale aerial assets. We aim to provide a thorough assessment of NeRF in 3D reconstruction from aerial images and compare it with three traditional multi‐view stereo (MVS) pipelines. However, typical NeRF approaches are not designed for large‐format aerial images, which result in very high memory consumption (often cost‐prohibitive) and slow convergence when directly applied to aerial assets. Despite a few NeRF variants adopting a representation tiling scheme to increase scalability, the random ray‐sampling strategy during training still hinders its general applicability for aerial assets. To perform an effective evaluation, we propose a new scheme to scale NeRF. In addition to representation tiling, we introduce a location‐specific sampling technique as well as a multi‐camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory and increase the convergence rate within tiles. The MCT method decomposes a large‐frame image into multiple tiled images with different camera models, allowing these small‐frame images to be fed into the training process as needed for specific locations without a loss of accuracy. This enables NeRF approaches to be applied to aerial datasets on affordable computing devices, such as regular workstations. The proposed adaptation can be implemented to adapt for scaling any existing NeRF methods. Therefore, in this paper, instead of comparing accuracy performance against different NeRF variants, we implement our method based on a representative approach, Mip‐NeRF, and compare it against three traditional photogrammetric MVS pipelines on a typical aerial dataset against lidar reference data to assess NeRF's performance. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy. The codes and datasets are made publicly available at <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/GDAOSU/MCT_NERF\">https://github.com/GDAOSU/MCT_NERF</jats:ext-link>.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"51 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Photogrammetric Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/phor.12498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Neural radiance fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well‐documented for large‐scale aerial assets. We aim to provide a thorough assessment of NeRF in 3D reconstruction from aerial images and compare it with three traditional multi‐view stereo (MVS) pipelines. However, typical NeRF approaches are not designed for large‐format aerial images, which result in very high memory consumption (often cost‐prohibitive) and slow convergence when directly applied to aerial assets. Despite a few NeRF variants adopting a representation tiling scheme to increase scalability, the random ray‐sampling strategy during training still hinders its general applicability for aerial assets. To perform an effective evaluation, we propose a new scheme to scale NeRF. In addition to representation tiling, we introduce a location‐specific sampling technique as well as a multi‐camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory and increase the convergence rate within tiles. The MCT method decomposes a large‐frame image into multiple tiled images with different camera models, allowing these small‐frame images to be fed into the training process as needed for specific locations without a loss of accuracy. This enables NeRF approaches to be applied to aerial datasets on affordable computing devices, such as regular workstations. The proposed adaptation can be implemented to adapt for scaling any existing NeRF methods. Therefore, in this paper, instead of comparing accuracy performance against different NeRF variants, we implement our method based on a representative approach, Mip‐NeRF, and compare it against three traditional photogrammetric MVS pipelines on a typical aerial dataset against lidar reference data to assess NeRF's performance. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy. The codes and datasets are made publicly available at https://github.com/GDAOSU/MCT_NERF.