{"title":"从间隙到粒度:基于CRPAG-DSHAT的喜马拉雅山DEM空洞修复与超分辨率重建多模态深度学习框架","authors":"Sayantan Mandal, Ashis Kumar Saha","doi":"10.1016/j.ophoto.2025.100101","DOIUrl":null,"url":null,"abstract":"<div><div>Digital Elevation Models (DEMs) are essential for terrain characterization and environmental modeling, yet their utility is limited by data voids and coarse resolution, especially in complex mountainous regions of Himalayas. To address these challenges, we propose a novel dual-stage deep learning pipeline that unifies void filling and super-resolution into a cohesive framework, leveraging both topographic fidelity and spectral texture. First, the <strong>Conditional Residual Pyramid Attentional Generator (CRPAG)</strong> a hybrid model that integrates multi-scale DEM features with Sentinel-2 red band reflectance (∼665 nm) using an <strong>Improved Channel Attention Module</strong> (ICAM), <strong>Residual Pyramid Attention Block</strong> (TFG_RPAB), and a dual-encoder design. This allows CRPAG to prioritize structural fidelity (RMSE 9.1–28.9 m) while reconstructing missing terrain features (Mean Absolute Error MAE 1.9–8.1 m). This void-filled, high-resolution DEM then supervise the training of <strong>Dual-Stream Hierarchical Attention Transformer (DS-HAT)</strong>, which performs super-resolution on globally available low-resolution DEMs (ALOS PALSAR), guided by pixel-wise height attention and texture-aware mechanisms. Compared to benchmark models such as MCU-Net-EDF and conventional U-Nets, our integrated system shows improvements in elevation accuracy (RMSE ↓, P95 = 9.2 m), spatial consistency (Moran's I ↑), and structural similarity (SSIM ↑), particularly across high-curvature and spectrally ambiguous regions. Besides, Ablation studies confirm the complementary applications of topographic variables in mitigating oversmoothing and enhancing terrain realism. This dual-stage strategy not only enhances DEM fidelity but also provides a scalable framework for improving DEM quality. Through this multi-modal fusion, this work transforms topographic knowledge into computable framework, advancing DEM applicability in hydrological modeling, detection mechanisms and disaster prediction.</div></div>","PeriodicalId":100730,"journal":{"name":"ISPRS Open Journal of Photogrammetry and Remote Sensing","volume":"17 ","pages":"Article 100101"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From gaps to granularity: CRPAG-DSHAT based multi-modal deep learning framework for DEM void repair and super-resolution reconstruction in Himalayas\",\"authors\":\"Sayantan Mandal, Ashis Kumar Saha\",\"doi\":\"10.1016/j.ophoto.2025.100101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Digital Elevation Models (DEMs) are essential for terrain characterization and environmental modeling, yet their utility is limited by data voids and coarse resolution, especially in complex mountainous regions of Himalayas. To address these challenges, we propose a novel dual-stage deep learning pipeline that unifies void filling and super-resolution into a cohesive framework, leveraging both topographic fidelity and spectral texture. First, the <strong>Conditional Residual Pyramid Attentional Generator (CRPAG)</strong> a hybrid model that integrates multi-scale DEM features with Sentinel-2 red band reflectance (∼665 nm) using an <strong>Improved Channel Attention Module</strong> (ICAM), <strong>Residual Pyramid Attention Block</strong> (TFG_RPAB), and a dual-encoder design. This allows CRPAG to prioritize structural fidelity (RMSE 9.1–28.9 m) while reconstructing missing terrain features (Mean Absolute Error MAE 1.9–8.1 m). This void-filled, high-resolution DEM then supervise the training of <strong>Dual-Stream Hierarchical Attention Transformer (DS-HAT)</strong>, which performs super-resolution on globally available low-resolution DEMs (ALOS PALSAR), guided by pixel-wise height attention and texture-aware mechanisms. Compared to benchmark models such as MCU-Net-EDF and conventional U-Nets, our integrated system shows improvements in elevation accuracy (RMSE ↓, P95 = 9.2 m), spatial consistency (Moran's I ↑), and structural similarity (SSIM ↑), particularly across high-curvature and spectrally ambiguous regions. Besides, Ablation studies confirm the complementary applications of topographic variables in mitigating oversmoothing and enhancing terrain realism. This dual-stage strategy not only enhances DEM fidelity but also provides a scalable framework for improving DEM quality. Through this multi-modal fusion, this work transforms topographic knowledge into computable framework, advancing DEM applicability in hydrological modeling, detection mechanisms and disaster prediction.</div></div>\",\"PeriodicalId\":100730,\"journal\":{\"name\":\"ISPRS Open Journal of Photogrammetry and Remote Sensing\",\"volume\":\"17 \",\"pages\":\"Article 100101\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Open Journal of Photogrammetry and Remote Sensing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667393225000201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Open Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667393225000201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
From gaps to granularity: CRPAG-DSHAT based multi-modal deep learning framework for DEM void repair and super-resolution reconstruction in Himalayas
Digital Elevation Models (DEMs) are essential for terrain characterization and environmental modeling, yet their utility is limited by data voids and coarse resolution, especially in complex mountainous regions of Himalayas. To address these challenges, we propose a novel dual-stage deep learning pipeline that unifies void filling and super-resolution into a cohesive framework, leveraging both topographic fidelity and spectral texture. First, the Conditional Residual Pyramid Attentional Generator (CRPAG) a hybrid model that integrates multi-scale DEM features with Sentinel-2 red band reflectance (∼665 nm) using an Improved Channel Attention Module (ICAM), Residual Pyramid Attention Block (TFG_RPAB), and a dual-encoder design. This allows CRPAG to prioritize structural fidelity (RMSE 9.1–28.9 m) while reconstructing missing terrain features (Mean Absolute Error MAE 1.9–8.1 m). This void-filled, high-resolution DEM then supervise the training of Dual-Stream Hierarchical Attention Transformer (DS-HAT), which performs super-resolution on globally available low-resolution DEMs (ALOS PALSAR), guided by pixel-wise height attention and texture-aware mechanisms. Compared to benchmark models such as MCU-Net-EDF and conventional U-Nets, our integrated system shows improvements in elevation accuracy (RMSE ↓, P95 = 9.2 m), spatial consistency (Moran's I ↑), and structural similarity (SSIM ↑), particularly across high-curvature and spectrally ambiguous regions. Besides, Ablation studies confirm the complementary applications of topographic variables in mitigating oversmoothing and enhancing terrain realism. This dual-stage strategy not only enhances DEM fidelity but also provides a scalable framework for improving DEM quality. Through this multi-modal fusion, this work transforms topographic knowledge into computable framework, advancing DEM applicability in hydrological modeling, detection mechanisms and disaster prediction.