LDH-Net: Luminance-based Deep Hybrid Network for Document Image De-shadowing

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-08-27 DOI:10.1016/j.imavis.2025.105705

Fan Yang , Kunchi Li , Nanfeng Jiang, Yun Wu, Ziyu Li, Da-Han Wang

{"title":"LDH-Net: Luminance-based Deep Hybrid Network for Document Image De-shadowing","authors":"Fan Yang , Kunchi Li , Nanfeng Jiang, Yun Wu, Ziyu Li, Da-Han Wang","doi":"10.1016/j.imavis.2025.105705","DOIUrl":null,"url":null,"abstract":"<div><div>Existing deep learning-based Document Image De-shadowing (DID) methods face two key challenges. First, they struggle with complex shadows due to insufficient use of auxiliary information, such as shadow locations and illumination details. Second, they fail to effectively balance global relationships across the entire image with local feature learning to restore texture details in shadowed regions. To address these limitations, we propose a dual-branch de-shadowing network, called LDH-Net, which integrates luminance information as an auxiliary information for de-shadowing. The first branch extracts shadow-distorted features by estimating a shadow luminance map, while the second branch uses them to locate shadow regions and guide the de-shadowing. Both branches employ a hybrid feature learning mechanism to capture local and global information efficiently with lower complexity. This mechanism includes two key modules: Horizon-Vertical Attention (HVA) and Dilated Convolution Mamba (DCM). HVA models long-range pixel dependencies to propagate contextual information across the entire image to ensure global coherence and consistency. DCM utilizes dilated convolution within the State Space Model (SSM) to capture extensive contextual information and preserve local image details. Additionally, we introduce a luminance map loss to provide accurate optimization for reconstruction. Experiments on RDD, Kligler’s, Jung’s, and OSR demonstrate that LDH-Net outperforms previous state-of-the-art methods. Specifically, LDH-Net achieves the best PSNR/SSIM/LPIPS scores across all datasets, with up to 37.76 PSNR/0.981 SSIM/0.005 LPIPS on RDD datasets and consistent improvements on other benchmarks, confirming its superior performance on both visual quality and structural preservation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105705"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002938","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Existing deep learning-based Document Image De-shadowing (DID) methods face two key challenges. First, they struggle with complex shadows due to insufficient use of auxiliary information, such as shadow locations and illumination details. Second, they fail to effectively balance global relationships across the entire image with local feature learning to restore texture details in shadowed regions. To address these limitations, we propose a dual-branch de-shadowing network, called LDH-Net, which integrates luminance information as an auxiliary information for de-shadowing. The first branch extracts shadow-distorted features by estimating a shadow luminance map, while the second branch uses them to locate shadow regions and guide the de-shadowing. Both branches employ a hybrid feature learning mechanism to capture local and global information efficiently with lower complexity. This mechanism includes two key modules: Horizon-Vertical Attention (HVA) and Dilated Convolution Mamba (DCM). HVA models long-range pixel dependencies to propagate contextual information across the entire image to ensure global coherence and consistency. DCM utilizes dilated convolution within the State Space Model (SSM) to capture extensive contextual information and preserve local image details. Additionally, we introduce a luminance map loss to provide accurate optimization for reconstruction. Experiments on RDD, Kligler’s, Jung’s, and OSR demonstrate that LDH-Net outperforms previous state-of-the-art methods. Specifically, LDH-Net achieves the best PSNR/SSIM/LPIPS scores across all datasets, with up to 37.76 PSNR/0.981 SSIM/0.005 LPIPS on RDD datasets and consistent improvements on other benchmarks, confirming its superior performance on both visual quality and structural preservation.

查看原文本刊更多论文

LDH-Net：基于亮度的文档图像去阴影深度混合网络

现有的基于深度学习的文档图像去阴影（DID）方法面临两个关键挑战。首先，由于辅助信息（如阴影位置和照明细节）的使用不足，它们难以处理复杂的阴影。其次，它们不能有效地平衡整个图像的全局关系和局部特征学习，以恢复阴影区域的纹理细节。为了解决这些限制，我们提出了一个双分支去影网络，称为LDH-Net，它集成了亮度信息作为去影的辅助信息。第一个分支通过估计阴影亮度图来提取阴影失真特征，第二个分支利用这些特征来定位阴影区域并指导去阴影。这两个分支都采用混合特征学习机制，以较低的复杂性有效地捕获本地和全局信息。该机制包括两个关键模块：水平-垂直注意力（HVA）和扩张卷积曼巴（DCM）。HVA对远程像素依赖关系进行建模，以在整个图像中传播上下文信息，以确保全局一致性和一致性。DCM利用状态空间模型（SSM）中的扩展卷积来捕获广泛的上下文信息并保留局部图像细节。此外，我们还引入了亮度图损失，为重建提供了精确的优化。在RDD、Kligler、Jung和OSR上的实验表明，LDH-Net优于以前最先进的方法。具体来说，LDH-Net在所有数据集上都取得了最佳的PSNR/SSIM/LPIPS分数，在RDD数据集上达到37.76 PSNR/0.981 SSIM/0.005 LPIPS，在其他基准上也有一致的改进，证实了其在视觉质量和结构保存方面的卓越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.