Fan Yang , Kunchi Li , Nanfeng Jiang, Yun Wu, Ziyu Li, Da-Han Wang
{"title":"LDH-Net:基于亮度的文档图像去阴影深度混合网络","authors":"Fan Yang , Kunchi Li , Nanfeng Jiang, Yun Wu, Ziyu Li, Da-Han Wang","doi":"10.1016/j.imavis.2025.105705","DOIUrl":null,"url":null,"abstract":"<div><div>Existing deep learning-based Document Image De-shadowing (DID) methods face two key challenges. First, they struggle with complex shadows due to insufficient use of auxiliary information, such as shadow locations and illumination details. Second, they fail to effectively balance global relationships across the entire image with local feature learning to restore texture details in shadowed regions. To address these limitations, we propose a dual-branch de-shadowing network, called LDH-Net, which integrates luminance information as an auxiliary information for de-shadowing. The first branch extracts shadow-distorted features by estimating a shadow luminance map, while the second branch uses them to locate shadow regions and guide the de-shadowing. Both branches employ a hybrid feature learning mechanism to capture local and global information efficiently with lower complexity. This mechanism includes two key modules: Horizon-Vertical Attention (HVA) and Dilated Convolution Mamba (DCM). HVA models long-range pixel dependencies to propagate contextual information across the entire image to ensure global coherence and consistency. DCM utilizes dilated convolution within the State Space Model (SSM) to capture extensive contextual information and preserve local image details. Additionally, we introduce a luminance map loss to provide accurate optimization for reconstruction. Experiments on RDD, Kligler’s, Jung’s, and OSR demonstrate that LDH-Net outperforms previous state-of-the-art methods. Specifically, LDH-Net achieves the best PSNR/SSIM/LPIPS scores across all datasets, with up to 37.76 PSNR/0.981 SSIM/0.005 LPIPS on RDD datasets and consistent improvements on other benchmarks, confirming its superior performance on both visual quality and structural preservation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105705"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LDH-Net: Luminance-based Deep Hybrid Network for Document Image De-shadowing\",\"authors\":\"Fan Yang , Kunchi Li , Nanfeng Jiang, Yun Wu, Ziyu Li, Da-Han Wang\",\"doi\":\"10.1016/j.imavis.2025.105705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing deep learning-based Document Image De-shadowing (DID) methods face two key challenges. First, they struggle with complex shadows due to insufficient use of auxiliary information, such as shadow locations and illumination details. Second, they fail to effectively balance global relationships across the entire image with local feature learning to restore texture details in shadowed regions. To address these limitations, we propose a dual-branch de-shadowing network, called LDH-Net, which integrates luminance information as an auxiliary information for de-shadowing. The first branch extracts shadow-distorted features by estimating a shadow luminance map, while the second branch uses them to locate shadow regions and guide the de-shadowing. Both branches employ a hybrid feature learning mechanism to capture local and global information efficiently with lower complexity. This mechanism includes two key modules: Horizon-Vertical Attention (HVA) and Dilated Convolution Mamba (DCM). HVA models long-range pixel dependencies to propagate contextual information across the entire image to ensure global coherence and consistency. DCM utilizes dilated convolution within the State Space Model (SSM) to capture extensive contextual information and preserve local image details. Additionally, we introduce a luminance map loss to provide accurate optimization for reconstruction. Experiments on RDD, Kligler’s, Jung’s, and OSR demonstrate that LDH-Net outperforms previous state-of-the-art methods. Specifically, LDH-Net achieves the best PSNR/SSIM/LPIPS scores across all datasets, with up to 37.76 PSNR/0.981 SSIM/0.005 LPIPS on RDD datasets and consistent improvements on other benchmarks, confirming its superior performance on both visual quality and structural preservation.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"162 \",\"pages\":\"Article 105705\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625002938\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002938","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
LDH-Net: Luminance-based Deep Hybrid Network for Document Image De-shadowing
Existing deep learning-based Document Image De-shadowing (DID) methods face two key challenges. First, they struggle with complex shadows due to insufficient use of auxiliary information, such as shadow locations and illumination details. Second, they fail to effectively balance global relationships across the entire image with local feature learning to restore texture details in shadowed regions. To address these limitations, we propose a dual-branch de-shadowing network, called LDH-Net, which integrates luminance information as an auxiliary information for de-shadowing. The first branch extracts shadow-distorted features by estimating a shadow luminance map, while the second branch uses them to locate shadow regions and guide the de-shadowing. Both branches employ a hybrid feature learning mechanism to capture local and global information efficiently with lower complexity. This mechanism includes two key modules: Horizon-Vertical Attention (HVA) and Dilated Convolution Mamba (DCM). HVA models long-range pixel dependencies to propagate contextual information across the entire image to ensure global coherence and consistency. DCM utilizes dilated convolution within the State Space Model (SSM) to capture extensive contextual information and preserve local image details. Additionally, we introduce a luminance map loss to provide accurate optimization for reconstruction. Experiments on RDD, Kligler’s, Jung’s, and OSR demonstrate that LDH-Net outperforms previous state-of-the-art methods. Specifically, LDH-Net achieves the best PSNR/SSIM/LPIPS scores across all datasets, with up to 37.76 PSNR/0.981 SSIM/0.005 LPIPS on RDD datasets and consistent improvements on other benchmarks, confirming its superior performance on both visual quality and structural preservation.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.