对大规模航空数据集进行多层次神经辐射场（NeRF）几何评估

The Photogrammetric Record Pub Date : 2024-05-01 DOI:10.1111/phor.12498

Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino

{"title":"对大规模航空数据集进行多层次神经辐射场（NeRF）几何评估","authors":"Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino","doi":"10.1111/phor.12498","DOIUrl":null,"url":null,"abstract":"Neural radiance fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well‐documented for large‐scale aerial assets. We aim to provide a thorough assessment of NeRF in 3D reconstruction from aerial images and compare it with three traditional multi‐view stereo (MVS) pipelines. However, typical NeRF approaches are not designed for large‐format aerial images, which result in very high memory consumption (often cost‐prohibitive) and slow convergence when directly applied to aerial assets. Despite a few NeRF variants adopting a representation tiling scheme to increase scalability, the random ray‐sampling strategy during training still hinders its general applicability for aerial assets. To perform an effective evaluation, we propose a new scheme to scale NeRF. In addition to representation tiling, we introduce a location‐specific sampling technique as well as a multi‐camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory and increase the convergence rate within tiles. The MCT method decomposes a large‐frame image into multiple tiled images with different camera models, allowing these small‐frame images to be fed into the training process as needed for specific locations without a loss of accuracy. This enables NeRF approaches to be applied to aerial datasets on affordable computing devices, such as regular workstations. The proposed adaptation can be implemented to adapt for scaling any existing NeRF methods. Therefore, in this paper, instead of comparing accuracy performance against different NeRF variants, we implement our method based on a representative approach, Mip‐NeRF, and compare it against three traditional photogrammetric MVS pipelines on a typical aerial dataset against lidar reference data to assess NeRF's performance. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy. The codes and datasets are made publicly available at https://github.com/GDAOSU/MCT_NERF.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":"51 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi‐tiling neural radiance field (NeRF)—geometric assessment on large‐scale aerial datasets\",\"authors\":\"Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino\",\"doi\":\"10.1111/phor.12498\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural radiance fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well‐documented for large‐scale aerial assets. We aim to provide a thorough assessment of NeRF in 3D reconstruction from aerial images and compare it with three traditional multi‐view stereo (MVS) pipelines. However, typical NeRF approaches are not designed for large‐format aerial images, which result in very high memory consumption (often cost‐prohibitive) and slow convergence when directly applied to aerial assets. Despite a few NeRF variants adopting a representation tiling scheme to increase scalability, the random ray‐sampling strategy during training still hinders its general applicability for aerial assets. To perform an effective evaluation, we propose a new scheme to scale NeRF. In addition to representation tiling, we introduce a location‐specific sampling technique as well as a multi‐camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory and increase the convergence rate within tiles. The MCT method decomposes a large‐frame image into multiple tiled images with different camera models, allowing these small‐frame images to be fed into the training process as needed for specific locations without a loss of accuracy. This enables NeRF approaches to be applied to aerial datasets on affordable computing devices, such as regular workstations. The proposed adaptation can be implemented to adapt for scaling any existing NeRF methods. Therefore, in this paper, instead of comparing accuracy performance against different NeRF variants, we implement our method based on a representative approach, Mip‐NeRF, and compare it against three traditional photogrammetric MVS pipelines on a typical aerial dataset against lidar reference data to assess NeRF's performance. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy. The codes and datasets are made publicly available at https://github.com/GDAOSU/MCT_NERF.\",\"PeriodicalId\":22881,\"journal\":{\"name\":\"The Photogrammetric Record\",\"volume\":\"51 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Photogrammetric Record\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/phor.12498\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Photogrammetric Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/phor.12498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

神经辐射场（NeRF）为三维重建任务（包括航空摄影测量）提供了潜在的益处。然而，对于大规模航空资产而言，推断几何图形的可扩展性和准确性并没有得到充分的证明。我们旨在全面评估 NeRF 在航空图像三维重建中的应用，并将其与三种传统的多视角立体（MVS）管道进行比较。然而，典型的 NeRF 方法并非专为大尺寸航空图像而设计，因此直接应用于航空资产时内存消耗非常高（通常成本高昂），收敛速度也很慢。尽管有一些 NeRF 变体采用了平铺表示方案来提高可扩展性，但训练过程中的随机光线采样策略仍然阻碍了其在航空资产中的普遍应用。为了进行有效评估，我们提出了一种新的 NeRF 扩展方案。除了平铺表示法，我们还引入了特定位置采样技术和多摄像机平铺（MCT）策略，以减少 RAM 图像加载和 GPU 表示法训练时的内存消耗，并提高平铺内的收敛速度。MCT 方法可将大帧图像分解为多个具有不同相机模型的平铺图像，使这些小帧图像能够根据特定位置的需要输入到训练过程中，而不会降低精度。这样，NeRF 方法就能在经济实惠的计算设备（如普通工作站）上应用于航空数据集。建议的适应性可用于调整现有 NeRF 方法的规模。因此，在本文中，我们没有比较不同 NeRF 变体的精度性能，而是基于一种代表性方法 Mip-NeRF 实施了我们的方法，并在一个典型的航空数据集和激光雷达参考数据上将其与三种传统摄影测量 MVS 管道进行比较，以评估 NeRF 的性能。定性和定量结果都表明，与传统方法相比，所提出的 NeRF 方法能产生更好的完整性和物体细节，但目前在精度方面仍有不足。代码和数据集可在 https://github.com/GDAOSU/MCT_NERF 网站上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi‐tiling neural radiance field (NeRF)—geometric assessment on large‐scale aerial datasets

Neural radiance fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well‐documented for large‐scale aerial assets. We aim to provide a thorough assessment of NeRF in 3D reconstruction from aerial images and compare it with three traditional multi‐view stereo (MVS) pipelines. However, typical NeRF approaches are not designed for large‐format aerial images, which result in very high memory consumption (often cost‐prohibitive) and slow convergence when directly applied to aerial assets. Despite a few NeRF variants adopting a representation tiling scheme to increase scalability, the random ray‐sampling strategy during training still hinders its general applicability for aerial assets. To perform an effective evaluation, we propose a new scheme to scale NeRF. In addition to representation tiling, we introduce a location‐specific sampling technique as well as a multi‐camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory and increase the convergence rate within tiles. The MCT method decomposes a large‐frame image into multiple tiled images with different camera models, allowing these small‐frame images to be fed into the training process as needed for specific locations without a loss of accuracy. This enables NeRF approaches to be applied to aerial datasets on affordable computing devices, such as regular workstations. The proposed adaptation can be implemented to adapt for scaling any existing NeRF methods. Therefore, in this paper, instead of comparing accuracy performance against different NeRF variants, we implement our method based on a representative approach, Mip‐NeRF, and compare it against three traditional photogrammetric MVS pipelines on a typical aerial dataset against lidar reference data to assess NeRF's performance. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy. The codes and datasets are made publicly available at https://github.com/GDAOSU/MCT_NERF.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Photogrammetric Record

自引率

0.00%

发文量