Local Evaluation of Large-scale Remote Sensing Machine Learning-generated Building and Road Dataset: The Case of Rwanda

IF 3.3 4区地球科学 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY

PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science Pub Date : 2024-07-24 DOI:10.1007/s41064-024-00297-9

Emmanuel Nyandwi, Markus Gerke, Pedro Achanccaray

{"title":"Local Evaluation of Large-scale Remote Sensing Machine Learning-generated Building and Road Dataset: The Case of Rwanda","authors":"Emmanuel Nyandwi, Markus Gerke, Pedro Achanccaray","doi":"10.1007/s41064-024-00297-9","DOIUrl":null,"url":null,"abstract":"<p>Accurate and up-to-date building and road data are crucial for informed spatial planning. In developing regions in particular, major challenges arise due to the limited availability of these data, primarily as a result of the inherent inefficiency of traditional field-based surveys and manual data generation methods. Importantly, this limitation has prompted the exploration of alternative solutions, including the use of remote sensing machine learning-generated (RSML) datasets. Within the field of RSML datasets, a plethora of models have been proposed. However, these methods, evaluated in a research setting, may not translate perfectly to massive real-world applications, attributable to potential inaccuracies in unknown geographic spaces. The scepticism surrounding the usefulness of datasets generated by global models, owing to unguaranteed local accuracy, appears to be particularly concerning. As a consequence, rigorous evaluations of these datasets in local scenarios are essential for gaining insights into their usability. To address this concern, this study investigates the local accuracy of large RSML datasets. For this evaluation, we employed a dataset generated using models pre-trained on a variety of samples drawn from across the world and accessible from public repositories of open benchmark datasets. Subsequently, these models were fine-tuned with a limited set of local samples specific to Rwanda. In addition, the evaluation included Microsoft’s and Google’s global datasets. Using ResNet and Mask R‑CNN, we explored the performance variations of different building detection approaches: bottom-up, end-to-end, and their combination. For road extraction, we explored the approach of training multiple models on subsets representing different road types. Our testing dataset was carefully designed to be diverse, incorporating both easy and challenging scenes. It includes areas purposefully chosen for their high level of clutter, making it difficult to detect structures like buildings. This inclusion of complex scenarios alongside simpler ones allows us to thoroughly assess the robustness of DL-based detection models for handling diverse real-world conditions. In addition, buildings were evaluated using a polygon-wise comparison, while roads were assessed using network length-derived metrics.</p><p>Our results showed a precision (P) of around 75% and a recall (R) of around 60% for the locally fine-tuned building model. This performance was achieved in three out of six testing sites and is considered the lowest limit needed for practical utility of RSML datasets, according to the literature. In contrast, comparable results were obtained in only one out of six sites for the Google and Microsoft datasets. Our locally fine-tuned road model achieved moderate success, meeting the minimum usability threshold in four out of six sites. In contrast, the Microsoft dataset performed well on all sites. In summary, our findings suggest improved performance in road extraction, relative to building extraction tasks. Moreover, we observed that a pipeline relying on a combination of bottom-up and top-down segmentation, while leveraging open global benchmark annotation dataset as well as a small number of samples for fine-tuning, can offer more accurate RSML datasets compared to an open global dataset. Our findings suggest that relying solely on aggregated accuracy metrics can be misleading. According to our evaluation, even city-level derived measures may not capture significant variations in performance within a city, such as lower accuracy in specific neighbourhoods. Overcoming the challenges of complex areas might benefit from exploring alternative approaches, including the integration of LiDAR data, UAV images, aerial images or using other network architectures.</p>","PeriodicalId":56035,"journal":{"name":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","volume":"9 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s41064-024-00297-9","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate and up-to-date building and road data are crucial for informed spatial planning. In developing regions in particular, major challenges arise due to the limited availability of these data, primarily as a result of the inherent inefficiency of traditional field-based surveys and manual data generation methods. Importantly, this limitation has prompted the exploration of alternative solutions, including the use of remote sensing machine learning-generated (RSML) datasets. Within the field of RSML datasets, a plethora of models have been proposed. However, these methods, evaluated in a research setting, may not translate perfectly to massive real-world applications, attributable to potential inaccuracies in unknown geographic spaces. The scepticism surrounding the usefulness of datasets generated by global models, owing to unguaranteed local accuracy, appears to be particularly concerning. As a consequence, rigorous evaluations of these datasets in local scenarios are essential for gaining insights into their usability. To address this concern, this study investigates the local accuracy of large RSML datasets. For this evaluation, we employed a dataset generated using models pre-trained on a variety of samples drawn from across the world and accessible from public repositories of open benchmark datasets. Subsequently, these models were fine-tuned with a limited set of local samples specific to Rwanda. In addition, the evaluation included Microsoft’s and Google’s global datasets. Using ResNet and Mask R‑CNN, we explored the performance variations of different building detection approaches: bottom-up, end-to-end, and their combination. For road extraction, we explored the approach of training multiple models on subsets representing different road types. Our testing dataset was carefully designed to be diverse, incorporating both easy and challenging scenes. It includes areas purposefully chosen for their high level of clutter, making it difficult to detect structures like buildings. This inclusion of complex scenarios alongside simpler ones allows us to thoroughly assess the robustness of DL-based detection models for handling diverse real-world conditions. In addition, buildings were evaluated using a polygon-wise comparison, while roads were assessed using network length-derived metrics.

Our results showed a precision (P) of around 75% and a recall (R) of around 60% for the locally fine-tuned building model. This performance was achieved in three out of six testing sites and is considered the lowest limit needed for practical utility of RSML datasets, according to the literature. In contrast, comparable results were obtained in only one out of six sites for the Google and Microsoft datasets. Our locally fine-tuned road model achieved moderate success, meeting the minimum usability threshold in four out of six sites. In contrast, the Microsoft dataset performed well on all sites. In summary, our findings suggest improved performance in road extraction, relative to building extraction tasks. Moreover, we observed that a pipeline relying on a combination of bottom-up and top-down segmentation, while leveraging open global benchmark annotation dataset as well as a small number of samples for fine-tuning, can offer more accurate RSML datasets compared to an open global dataset. Our findings suggest that relying solely on aggregated accuracy metrics can be misleading. According to our evaluation, even city-level derived measures may not capture significant variations in performance within a city, such as lower accuracy in specific neighbourhoods. Overcoming the challenges of complex areas might benefit from exploring alternative approaches, including the integration of LiDAR data, UAV images, aerial images or using other network architectures.

Abstract Image

查看原文本刊更多论文

大规模遥感机器学习生成的建筑物和道路数据集的地方评估：卢旺达案例

准确、最新的建筑和道路数据对于知情的空间规划至关重要。特别是在发展中地区，由于这些数据的可用性有限，主要是由于传统的实地调查和人工数据生成方法固有的低效率造成的，因此面临着重大挑战。重要的是，这种局限性促使人们探索其他解决方案，包括使用遥感机器学习生成（RSML）数据集。在 RSML 数据集领域，已经提出了大量模型。然而，这些在研究环境中进行评估的方法可能无法完美地应用于大规模的现实世界，原因是在未知的地理空间中可能存在误差。由于无法保证局部准确性，人们对全球模型生成的数据集的实用性持怀疑态度，这似乎尤其令人担忧。因此，在本地场景中对这些数据集进行严格评估对于深入了解其可用性至关重要。为了解决这个问题，本研究调查了大型 RSML 数据集的局部准确性。在评估过程中，我们使用了一个数据集，该数据集是使用在来自世界各地的各种样本上预先训练的模型生成的，这些样本可从开放基准数据集的公共存储库中获取。随后，我们使用卢旺达本地的有限样本集对这些模型进行了微调。此外，评估还包括微软和谷歌的全球数据集。利用 ResNet 和 Mask R-CNN，我们探索了不同建筑物检测方法的性能差异：自下而上、端到端以及它们的组合。在道路提取方面，我们探索了在代表不同道路类型的子集上训练多个模型的方法。我们的测试数据集经过精心设计，既包括简单场景，也包括具有挑战性的场景。测试数据集特意选择了杂乱程度较高的区域，这样就很难检测到建筑物等结构。将复杂场景与简单场景结合在一起，使我们能够全面评估基于 DL 的检测模型在处理现实世界各种条件时的鲁棒性。此外，我们还使用多边形比较法对建筑物进行了评估，并使用源自网络长度的指标对道路进行了评估。我们的结果显示，局部微调的建筑物模型的精确度（P）约为 75%，召回率（R）约为 60%。我们的结果表明，局部微调建筑模型的精确度（P）约为 75%，召回率（R）约为 60%，在六个测试点中的三个测试点都达到了这一性能，根据文献，这被认为是 RSML 数据集实用性所需的最低限度。相比之下，谷歌和微软数据集在六个测试点中只有一个测试点取得了类似的结果。我们的局部微调道路模型取得了中等程度的成功，在六个站点中的四个站点达到了最低可用性要求。相比之下，微软数据集在所有网站上的表现都很好。总之，我们的研究结果表明，相对于建筑物提取任务，道路提取的性能有所提高。此外，我们还发现，与开放的全局数据集相比，依靠自下而上和自上而下相结合的分割方法，同时利用开放的全局基准标注数据集和少量样本进行微调，可以提供更准确的 RSML 数据集。我们的研究结果表明，仅仅依赖综合准确度指标可能会产生误导。根据我们的评估，即使是城市级别的衍生指标也可能无法捕捉到城市内部性能的显著差异，例如特定街区的准确度较低。探索其他方法，包括整合激光雷达数据、无人机图像、航空图像或使用其他网络架构，可能有利于克服复杂地区的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science Physics and Astronomy-Instrumentation

CiteScore

8.20

自引率

2.40%

发文量

期刊介绍： PFG is an international scholarly journal covering the progress and application of photogrammetric methods, remote sensing technology and the interconnected field of geoinformation science. It places special editorial emphasis on the communication of new methodologies in data acquisition and new approaches to optimized processing and interpretation of all types of data which were acquired by photogrammetric methods, remote sensing, image processing and the computer-aided interpretation of such data in general. The journal hence addresses both researchers and students of these disciplines at academic institutions and universities as well as the downstream users in both the private sector and public administration. Founded in 1926 under the former name Bildmessung und Luftbildwesen, PFG is worldwide the oldest journal on photogrammetry. It is the official journal of the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF).