从间隙到粒度：基于CRPAG-DSHAT的喜马拉雅山DEM空洞修复与超分辨率重建多模态深度学习框架

ISPRS Open Journal of Photogrammetry and Remote Sensing Pub Date : 2025-08-01 DOI:10.1016/j.ophoto.2025.100101

Sayantan Mandal, Ashis Kumar Saha

{"title":"从间隙到粒度：基于CRPAG-DSHAT的喜马拉雅山DEM空洞修复与超分辨率重建多模态深度学习框架","authors":"Sayantan Mandal, Ashis Kumar Saha","doi":"10.1016/j.ophoto.2025.100101","DOIUrl":null,"url":null,"abstract":"<div><div>Digital Elevation Models (DEMs) are essential for terrain characterization and environmental modeling, yet their utility is limited by data voids and coarse resolution, especially in complex mountainous regions of Himalayas. To address these challenges, we propose a novel dual-stage deep learning pipeline that unifies void filling and super-resolution into a cohesive framework, leveraging both topographic fidelity and spectral texture. First, the <strong>Conditional Residual Pyramid Attentional Generator (CRPAG)</strong> a hybrid model that integrates multi-scale DEM features with Sentinel-2 red band reflectance (∼665 nm) using an <strong>Improved Channel Attention Module</strong> (ICAM), <strong>Residual Pyramid Attention Block</strong> (TFG_RPAB), and a dual-encoder design. This allows CRPAG to prioritize structural fidelity (RMSE 9.1–28.9 m) while reconstructing missing terrain features (Mean Absolute Error MAE 1.9–8.1 m). This void-filled, high-resolution DEM then supervise the training of <strong>Dual-Stream Hierarchical Attention Transformer (DS-HAT)</strong>, which performs super-resolution on globally available low-resolution DEMs (ALOS PALSAR), guided by pixel-wise height attention and texture-aware mechanisms. Compared to benchmark models such as MCU-Net-EDF and conventional U-Nets, our integrated system shows improvements in elevation accuracy (RMSE ↓, P95 = 9.2 m), spatial consistency (Moran's I ↑), and structural similarity (SSIM ↑), particularly across high-curvature and spectrally ambiguous regions. Besides, Ablation studies confirm the complementary applications of topographic variables in mitigating oversmoothing and enhancing terrain realism. This dual-stage strategy not only enhances DEM fidelity but also provides a scalable framework for improving DEM quality. Through this multi-modal fusion, this work transforms topographic knowledge into computable framework, advancing DEM applicability in hydrological modeling, detection mechanisms and disaster prediction.</div></div>","PeriodicalId":100730,"journal":{"name":"ISPRS Open Journal of Photogrammetry and Remote Sensing","volume":"17 ","pages":"Article 100101"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From gaps to granularity: CRPAG-DSHAT based multi-modal deep learning framework for DEM void repair and super-resolution reconstruction in Himalayas\",\"authors\":\"Sayantan Mandal, Ashis Kumar Saha\",\"doi\":\"10.1016/j.ophoto.2025.100101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Digital Elevation Models (DEMs) are essential for terrain characterization and environmental modeling, yet their utility is limited by data voids and coarse resolution, especially in complex mountainous regions of Himalayas. To address these challenges, we propose a novel dual-stage deep learning pipeline that unifies void filling and super-resolution into a cohesive framework, leveraging both topographic fidelity and spectral texture. First, the <strong>Conditional Residual Pyramid Attentional Generator (CRPAG)</strong> a hybrid model that integrates multi-scale DEM features with Sentinel-2 red band reflectance (∼665 nm) using an <strong>Improved Channel Attention Module</strong> (ICAM), <strong>Residual Pyramid Attention Block</strong> (TFG_RPAB), and a dual-encoder design. This allows CRPAG to prioritize structural fidelity (RMSE 9.1–28.9 m) while reconstructing missing terrain features (Mean Absolute Error MAE 1.9–8.1 m). This void-filled, high-resolution DEM then supervise the training of <strong>Dual-Stream Hierarchical Attention Transformer (DS-HAT)</strong>, which performs super-resolution on globally available low-resolution DEMs (ALOS PALSAR), guided by pixel-wise height attention and texture-aware mechanisms. Compared to benchmark models such as MCU-Net-EDF and conventional U-Nets, our integrated system shows improvements in elevation accuracy (RMSE ↓, P95 = 9.2 m), spatial consistency (Moran's I ↑), and structural similarity (SSIM ↑), particularly across high-curvature and spectrally ambiguous regions. Besides, Ablation studies confirm the complementary applications of topographic variables in mitigating oversmoothing and enhancing terrain realism. This dual-stage strategy not only enhances DEM fidelity but also provides a scalable framework for improving DEM quality. Through this multi-modal fusion, this work transforms topographic knowledge into computable framework, advancing DEM applicability in hydrological modeling, detection mechanisms and disaster prediction.</div></div>\",\"PeriodicalId\":100730,\"journal\":{\"name\":\"ISPRS Open Journal of Photogrammetry and Remote Sensing\",\"volume\":\"17 \",\"pages\":\"Article 100101\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Open Journal of Photogrammetry and Remote Sensing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667393225000201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Open Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667393225000201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

数字高程模型（dem）对于地形表征和环境建模至关重要，但其效用受到数据空洞和粗分辨率的限制，特别是在喜马拉雅复杂的山区。为了解决这些挑战，我们提出了一种新的双阶段深度学习管道，将空隙填充和超分辨率统一到一个内聚框架中，同时利用地形保真度和光谱纹理。首先，条件残差金字塔注意发生器（CRPAG）是一种混合模型，它将多尺度DEM特征与Sentinel-2红色波段反射率（~ 665 nm）集成在一起，采用改进的通道注意模块（ICAM）、残差金字塔注意块（TFG_RPAB）和双编码器设计。这允许CRPAG在重建缺失地形特征（平均绝对误差MAE 1.9-8.1 m）时优先考虑结构保真度（RMSE 9.1-28.9 m）。然后，这个充满空白的高分辨率DEM监督双流分层注意转换器（DS-HAT）的训练，该转换器在像素级高度注意和纹理感知机制的指导下，在全球可用的低分辨率DEM （ALOS PALSAR）上执行超分辨率。与MCU-Net-EDF和传统U-Nets等基准模型相比，我们的集成系统在高程精度（RMSE↓，P95 = 9.2 m）、空间一致性（Moran’s I↑）和结构相似性（SSIM↑）方面有所提高，特别是在高曲率和光谱模糊区域。此外，消融研究证实了地形变量在缓解过平滑和增强地形真实感方面的互补应用。这种双阶段策略不仅提高了DEM保真度，而且为提高DEM质量提供了可扩展的框架。通过这种多模态融合，这项工作将地形知识转化为可计算的框架，提高了DEM在水文建模、探测机制和灾害预测中的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From gaps to granularity: CRPAG-DSHAT based multi-modal deep learning framework for DEM void repair and super-resolution reconstruction in Himalayas

Digital Elevation Models (DEMs) are essential for terrain characterization and environmental modeling, yet their utility is limited by data voids and coarse resolution, especially in complex mountainous regions of Himalayas. To address these challenges, we propose a novel dual-stage deep learning pipeline that unifies void filling and super-resolution into a cohesive framework, leveraging both topographic fidelity and spectral texture. First, the Conditional Residual Pyramid Attentional Generator (CRPAG) a hybrid model that integrates multi-scale DEM features with Sentinel-2 red band reflectance (∼665 nm) using an Improved Channel Attention Module (ICAM), Residual Pyramid Attention Block (TFG_RPAB), and a dual-encoder design. This allows CRPAG to prioritize structural fidelity (RMSE 9.1–28.9 m) while reconstructing missing terrain features (Mean Absolute Error MAE 1.9–8.1 m). This void-filled, high-resolution DEM then supervise the training of Dual-Stream Hierarchical Attention Transformer (DS-HAT), which performs super-resolution on globally available low-resolution DEMs (ALOS PALSAR), guided by pixel-wise height attention and texture-aware mechanisms. Compared to benchmark models such as MCU-Net-EDF and conventional U-Nets, our integrated system shows improvements in elevation accuracy (RMSE ↓, P95 = 9.2 m), spatial consistency (Moran's I ↑), and structural similarity (SSIM ↑), particularly across high-curvature and spectrally ambiguous regions. Besides, Ablation studies confirm the complementary applications of topographic variables in mitigating oversmoothing and enhancing terrain realism. This dual-stage strategy not only enhances DEM fidelity but also provides a scalable framework for improving DEM quality. Through this multi-modal fusion, this work transforms topographic knowledge into computable framework, advancing DEM applicability in hydrological modeling, detection mechanisms and disaster prediction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISPRS Open Journal of Photogrammetry and Remote Sensing

CiteScore

5.10

自引率

0.00%

发文量