Che-Tsung Lin , Chun Chet Ng , Zhi Qin Tan , Wan Jun Nah , Xinyu Wang , Jie Long Kew , Pohao Hsu , Shang Hong Lai , Chee Seng Chan , Christopher Zach
{"title":"Text in the dark: Extremely low-light text image enhancement","authors":"Che-Tsung Lin , Chun Chet Ng , Zhi Qin Tan , Wan Jun Nah , Xinyu Wang , Jie Long Kew , Pohao Hsu , Shang Hong Lai , Chee Seng Chan , Christopher Zach","doi":"10.1016/j.image.2024.117222","DOIUrl":null,"url":null,"abstract":"<div><div>Extremely low-light text images pose significant challenges for scene text detection. Existing methods enhance these images using low-light image enhancement techniques before text detection. However, they fail to address the importance of low-level features, which are essential for optimal performance in downstream scene text tasks. Further research is also limited by the scarcity of extremely low-light text datasets. To address these limitations, we propose a novel, text-aware extremely low-light image enhancement framework. Our approach first integrates a Text-Aware Copy-Paste (Text-CP) augmentation method as a preprocessing step, followed by a dual-encoder–decoder architecture enhanced with Edge-Aware attention modules. We also introduce text detection and edge reconstruction losses to train the model to generate images with higher text visibility. Additionally, we propose a Supervised Deep Curve Estimation (Supervised-DCE) model for synthesizing extremely low-light images, allowing training on publicly available scene text datasets such as IC15. To further advance this domain, we annotated texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets. The proposed framework is rigorously tested against various traditional and deep learning-based methods on the newly labeled SID-Sony-Text, SID-Fuji-Text, LOL-Text, and synthetic extremely low-light IC15 datasets. Our extensive experiments demonstrate notable improvements in both image enhancement and scene text tasks, showcasing the model’s efficacy in text detection under extremely low-light conditions. Code and datasets will be released publicly at <span><span>https://github.com/chunchet-ng/Text-in-the-Dark</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117222"},"PeriodicalIF":3.4000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596524001231","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Extremely low-light text images pose significant challenges for scene text detection. Existing methods enhance these images using low-light image enhancement techniques before text detection. However, they fail to address the importance of low-level features, which are essential for optimal performance in downstream scene text tasks. Further research is also limited by the scarcity of extremely low-light text datasets. To address these limitations, we propose a novel, text-aware extremely low-light image enhancement framework. Our approach first integrates a Text-Aware Copy-Paste (Text-CP) augmentation method as a preprocessing step, followed by a dual-encoder–decoder architecture enhanced with Edge-Aware attention modules. We also introduce text detection and edge reconstruction losses to train the model to generate images with higher text visibility. Additionally, we propose a Supervised Deep Curve Estimation (Supervised-DCE) model for synthesizing extremely low-light images, allowing training on publicly available scene text datasets such as IC15. To further advance this domain, we annotated texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets. The proposed framework is rigorously tested against various traditional and deep learning-based methods on the newly labeled SID-Sony-Text, SID-Fuji-Text, LOL-Text, and synthetic extremely low-light IC15 datasets. Our extensive experiments demonstrate notable improvements in both image enhancement and scene text tasks, showcasing the model’s efficacy in text detection under extremely low-light conditions. Code and datasets will be released publicly at https://github.com/chunchet-ng/Text-in-the-Dark.
极低照度下的文本图像给场景文本检测带来了巨大挑战。现有方法在文本检测前使用低照度图像增强技术来增强这些图像。然而,这些方法未能解决低层次特征的重要性问题,而低层次特征对于优化下游场景文本任务的性能至关重要。此外,极低照度文本数据集的缺乏也限制了进一步的研究。为了解决这些局限性,我们提出了一种新颖的、文本感知的极低照度图像增强框架。我们的方法首先集成了文本感知复制-粘贴(Text-CP)增强方法作为预处理步骤,然后采用边缘感知注意模块增强的双编码器-解码器架构。我们还引入了文本检测和边缘重建损失,以训练模型生成具有更高文本可见性的图像。此外,我们还提出了一种用于合成极低光图像的监督深度曲线估计(Supervised-DCE)模型,允许在 IC15 等公开可用的场景文本数据集上进行训练。为了进一步推动这一领域的发展,我们在极低照度的 "See In the Dark"(SID)和普通的 "LOw-Light"(LOL)数据集中对文本进行了注释。在新标注的 SID-Sony-文本、SID-Fuji-文本、LOL-文本和合成的极低照度 IC15 数据集上,我们针对各种传统方法和基于深度学习的方法对所提出的框架进行了严格测试。我们的大量实验表明,该模型在图像增强和场景文本任务方面都有显著改进,展示了该模型在极弱光条件下进行文本检测的功效。代码和数据集将在 https://github.com/chunchet-ng/Text-in-the-Dark 上公开发布。
期刊介绍:
Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following:
To present a forum for the advancement of theory and practice of image communication.
To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems.
To contribute to a rapid information exchange between the industrial and academic environments.
The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world.
Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments.
Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.