Signal Processing-Image Communication最新文献

筛选
英文 中文
GAN-based multi-view video coding with spatio-temporal EPI reconstruction 基于gan的时空EPI重构多视点视频编码
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-11-29 DOI: 10.1016/j.image.2024.117242
Chengdong Lan, Hao Yan, Cheng Luo, Tiesong Zhao
{"title":"GAN-based multi-view video coding with spatio-temporal EPI reconstruction","authors":"Chengdong Lan,&nbsp;Hao Yan,&nbsp;Cheng Luo,&nbsp;Tiesong Zhao","doi":"10.1016/j.image.2024.117242","DOIUrl":"10.1016/j.image.2024.117242","url":null,"abstract":"<div><div>The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SInfo). Typically, depth maps are used to construct SInfo. However, these methods suffer from reconstruction inaccuracies and inherently high bitrates. In this paper, we propose a novel multi-view video coding method that leverages the image generation capabilities of Generative Adversarial Network (GAN) to improve the reconstruction accuracy of SInfo. Additionally, we consider incorporating information from adjacent temporal and spatial viewpoints to further reduce SInfo redundancy. At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize a convolutional network to extract the latent code of a GAN as SInfo. At the decoder, we combine the SInfo and adjacent viewpoints to reconstruct intermediate views using the GAN generator. Specifically, we establish a joint encoder constraint for reconstruction cost and SInfo entropy to achieve an optimal trade-off between reconstruction quality and bitrate overhead. Experiments demonstrate the significant improvement in Rate–Distortion (RD) performance compared to state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117242"},"PeriodicalIF":3.4,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143148373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-layer feature fusion based image style transfer with arbitrary text condition 基于多层特征融合的任意文本条件下的图像样式转移
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-11-28 DOI: 10.1016/j.image.2024.117243
Yue Yu, Jingshuo Xing, Nengli Li
{"title":"Multi-layer feature fusion based image style transfer with arbitrary text condition","authors":"Yue Yu,&nbsp;Jingshuo Xing,&nbsp;Nengli Li","doi":"10.1016/j.image.2024.117243","DOIUrl":"10.1016/j.image.2024.117243","url":null,"abstract":"<div><div>Style transfer refers to the conversion of images in two different domains. Compared with the style transfer based on the style image, the image style transfer through the text description is more free and applicable to more practical scenarios. However, the image style transfer method under the text condition needs to be trained and optimized for different text and image inputs each time, resulting in limited style transfer efficiency. Therefore, this paper proposes a multi-layer feature fusion based style transfer method (MlFFST) with arbitrary text condition. To address the problems of distortion and missing semantic content, we also introduce a multi-layer attention normalization module. The experimental results show that the method in this paper can generate stylized results with high quality, good effect and high stability for images and videos. And this method can meet real-time requirements to generate more artistic and aesthetic images and videos.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117243"},"PeriodicalIF":3.4,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143148376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive histogram equalization framework based on new visual prior and optimization model 基于新视觉先验和优化模型的自适应直方图均衡框架
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-11-28 DOI: 10.1016/j.image.2024.117246
Shiqi Liu, Qiding Lu, Shengkui Dai
{"title":"Adaptive histogram equalization framework based on new visual prior and optimization model","authors":"Shiqi Liu,&nbsp;Qiding Lu,&nbsp;Shengkui Dai","doi":"10.1016/j.image.2024.117246","DOIUrl":"10.1016/j.image.2024.117246","url":null,"abstract":"<div><div>Histogram Equalization (HE) algorithm remains one of the research hotspots in the field of image enhancement due to its computational simplicity. Despite numerous improvements made to HE algorithms, few can comprehensively account for all major drawbacks of HE. To address this issue, this paper proposes a novel histogram equalization framework, which is an adaptive and systematic resolution. Firstly, a novel optimization mathematical model is proposed to seek the optimal controlling parameters for modifying the histogram. Additionally, a new visual prior knowledge, termed Narrow Dynamic Prior (NDP), is summarized, which describes and reveals the subjective perceptual characteristics of the Human Visual System (HVS) for some special types of images. Then, this new knowledge is organically integrated with the new model to expand the application scope of HE. Lastly, unlike common brightness preservation algorithms, a novel method for brightness estimation and precise control is proposed. Experimental results demonstrate that the proposed equalization framework significantly mitigates the major drawbacks of HE, achieving notable advancements in striking a balance between contrast, brightness and detail of the output image. Both objective evaluation metrics and subjective visual perception indicate that the proposed algorithm outperforms other excellent competition algorithms selected in this paper.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117246"},"PeriodicalIF":3.4,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143148375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual information fidelity based frame level rate control for H.265/HEVC 基于视觉信息保真度的H.265/HEVC帧级速率控制
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-11-28 DOI: 10.1016/j.image.2024.117245
Luheng Jia , Haoqiang Ren , Zuhai Zhang , Li Song , Kebin Jia
{"title":"Visual information fidelity based frame level rate control for H.265/HEVC","authors":"Luheng Jia ,&nbsp;Haoqiang Ren ,&nbsp;Zuhai Zhang ,&nbsp;Li Song ,&nbsp;Kebin Jia","doi":"10.1016/j.image.2024.117245","DOIUrl":"10.1016/j.image.2024.117245","url":null,"abstract":"<div><div>Rate control in video coding seeks for various trade-off between bitrate and reconstruction quality, which is closely tied to image quality assessment. The widely used measurement of mean squared error (MSE) is inadequate in describing human visual characteristics, therefore, rate control algorithms based on MSE often fail to deliver optimal visual quality. To address this issue, we propose a frame level rate control algorithm based on a simplified version of visual information fidelity (VIF) as the quality assessment criterion to improve coding efficiency. Firstly, we simplify the VIF and establish its relationship with MSE, which reduce the computational complexity to make it possible for VIF to be used in video coding framework. Then we establish the relationship between VIF-based <span><math><mi>λ</mi></math></span> and MSE-based <span><math><mi>λ</mi></math></span> for <span><math><mi>λ</mi></math></span>-domain rate control including bit allocation and parameter adjustment. Moreover, using VIF-based <span><math><mi>λ</mi></math></span> directly integrates VIF-based distortion into the MSE-based rate–distortion optimized coding framework. Experimental results demonstrate that the coding efficiency of the proposed method outperforms the default frame-level rate control algorithms under distortion metrics of PSNR, SSIM, and VMAF by 3.4<span><math><mtext>%</mtext></math></span>, 4.0<span><math><mtext>%</mtext></math></span> and 3.3<span><math><mtext>%</mtext></math></span> in average. Furthermore, the proposed method reduces the quality fluctuation of the reconstructed video at high bitrate range and improves the bitrate accuracy under hierarchical configuration .</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"131 ","pages":"Article 117245"},"PeriodicalIF":3.4,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142759483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based multiview spatiotemporal feature interactive fusion for human action recognition in depth videos 基于变换的多视点时空特征交互融合深度视频人体动作识别
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-11-23 DOI: 10.1016/j.image.2024.117244
Hanbo Wu, Xin Ma, Yibin Li
{"title":"Transformer-based multiview spatiotemporal feature interactive fusion for human action recognition in depth videos","authors":"Hanbo Wu,&nbsp;Xin Ma,&nbsp;Yibin Li","doi":"10.1016/j.image.2024.117244","DOIUrl":"10.1016/j.image.2024.117244","url":null,"abstract":"<div><div>Spatiotemporal feature modeling is the key to human action recognition task. Multiview data is helpful in acquiring numerous clues to improve the robustness and accuracy of feature description. However, multiview action recognition has not been well explored yet. Most existing methods perform action recognition only from a single view, which leads to the limited performance. Depth data is insensitive to illumination and color variations and offers significant advantages by providing reliable 3D geometric information of the human body. In this study, we concentrate on action recognition from depth videos and introduce a transformer-based framework for the interactive fusion of multiview spatiotemporal features, facilitating effective action recognition through deep integration of multiview information. Specifically, the proposed framework consists of intra-view spatiotemporal feature modeling (ISTFM) and cross-view feature interactive fusion (CFIF). Firstly, we project a depth video into three orthogonal views to construct multiview depth dynamic volumes that describe the 3D spatiotemporal evolution of human actions. ISTFM takes multiview depth dynamic volumes as input to extract spatiotemporal features of three views with 3D CNN, then applies self-attention mechanism in transformer to model global context dependency within each view. CFIF subsequently extends self-attention into cross-attention to conduct deep interaction between different views, and further integrates cross-view features together to generate a multiview joint feature representation. Our proposed method is tested on two large-scale RGBD datasets by extensive experiments to demonstrate the remarkable improvement for enhancing the recognition performance.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"131 ","pages":"Article 117244"},"PeriodicalIF":3.4,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142745549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vocal cord anomaly detection based on Local Fine-Grained Contour Features 基于局部精细轮廓特征的声带异常检测
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-11-02 DOI: 10.1016/j.image.2024.117225
Yuqi Fan , Han Ye , Xiaohui Yuan
{"title":"Vocal cord anomaly detection based on Local Fine-Grained Contour Features","authors":"Yuqi Fan ,&nbsp;Han Ye ,&nbsp;Xiaohui Yuan","doi":"10.1016/j.image.2024.117225","DOIUrl":"10.1016/j.image.2024.117225","url":null,"abstract":"<div><div>Laryngoscopy is a popular examination for vocal cord disease diagnosis. The conventional screening of laryngoscopic images is labor-intensive and depends heavily on the experience of the medical specialists. Automatic detection of vocal cord diseases from laryngoscopic images is highly sought to assist regular image reading. In laryngoscopic images, the symptoms of vocal cord diseases are concentrated in the inner vocal cord contour, which is often characterized as vegetation and small protuberances. The existing classification methods pay little, if any, attention to the role of vocal cord contour in the diagnosis of vocal cord diseases and fail to effectively capture the fine-grained features. In this paper, we propose a novel Local Fine-grained Contour Feature extraction method for vocal cord anomaly detection. Our proposed method consists of four stages: image segmentation to obtain the overall vocal cord contour, inner vocal cord contour isolation to obtain the inner contour curve by comparing the changes of adjacent pixel values, extraction of the latent feature in the inner vocal cord contour by taking the tangent inclination angle of each point on the contour as the latent feature, and the classification module. Our experimental results demonstrate that the proposed method improves the detection performance of vocal cord anomaly and achieves an accuracy of 97.21%.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"131 ","pages":"Article 117225"},"PeriodicalIF":3.4,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142700767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SES-ReNet: Lightweight deep learning model for human detection in hazy weather conditions SES-ReNet:用于雾霾天气条件下人体检测的轻量级深度学习模型
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-10-30 DOI: 10.1016/j.image.2024.117223
Yassine Bouafia , Mohand Saïd Allili , Loucif Hebbache , Larbi Guezouli
{"title":"SES-ReNet: Lightweight deep learning model for human detection in hazy weather conditions","authors":"Yassine Bouafia ,&nbsp;Mohand Saïd Allili ,&nbsp;Loucif Hebbache ,&nbsp;Larbi Guezouli","doi":"10.1016/j.image.2024.117223","DOIUrl":"10.1016/j.image.2024.117223","url":null,"abstract":"<div><div>Accurate detection of people in outdoor scenes plays an essential role in improving personal safety and security. However, existing human detection algorithms face significant challenges when visibility is reduced and human appearance is degraded, particularly in hazy weather conditions. To address this problem, we present a novel lightweight model based on the RetinaNet detection architecture. The model incorporates a lightweight backbone feature extractor, a dehazing functionality based on knowledge distillation (KD), and a multi-scale attention mechanism based on the Squeeze and Excitation (SE) principle. KD is achieved from a larger network trained on unhazed clear images, whereas attention is incorporated at low-level and high-level features of the network. Experimental results have shown remarkable performance, outperforming state-of-the-art methods while running at 22 FPS. The combination of high accuracy and real-time capabilities makes our approach a promising solution for effective human detection in challenging weather conditions and suitable for real-time applications.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117223"},"PeriodicalIF":3.4,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HOI-V: One-stage human-object interaction detection based on multi-feature fusion in videos HOI-V:基于视频多特征融合的单阶段人-物互动检测
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-10-29 DOI: 10.1016/j.image.2024.117224
Dongzhou Gu , Kaihua Huang , Shiwei Ma , Jiang Liu
{"title":"HOI-V: One-stage human-object interaction detection based on multi-feature fusion in videos","authors":"Dongzhou Gu ,&nbsp;Kaihua Huang ,&nbsp;Shiwei Ma ,&nbsp;Jiang Liu","doi":"10.1016/j.image.2024.117224","DOIUrl":"10.1016/j.image.2024.117224","url":null,"abstract":"<div><div>Effective detection of Human-Object Interaction (HOI) is important for machine understanding of real-world scenarios. Nowadays, image-based HOI detection has been abundantly investigated, and recent one-stage methods strike a balance between accuracy and efficiency. However, it is difficult to predict temporal-aware interaction actions from static images since limited temporal context information is introduced. Meanwhile, due to the lack of early large-scale video HOI datasets and the high computational cost of spatial-temporal HOI model training, recent exploratory studies mostly follow a two-stage paradigm, but independent object detection and interaction recognition still suffer from computational redundancy and independent optimization. Therefore, inspired by the one-stage interaction point detection framework, a one-stage spatial-temporal HOI detection baseline is proposed in this paper, in which the short-term local motion features and long-term temporal context features are obtained by the proposed temporal differential excitation module (TDEM) and DLA-TSM backbone. Complementary visual features between multiple clips are then extracted by multi-feature fusion and fed into the parallel detection branches. Finally, a video dataset containing only actions with reduced data size (HOI-V) is constructed to motivate further research on end-to-end video HOI detection. Extensive experiments are also conducted to verify the validity of our proposed baseline.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117224"},"PeriodicalIF":3.4,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High efficiency deep image compression via channel-wise scale adaptive latent representation learning 通过信道尺度自适应潜表征学习实现高效深度图像压缩
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-10-28 DOI: 10.1016/j.image.2024.117227
Chenhao Wu, Qingbo Wu, King Ngi Ngan, Hongliang Li, Fanman Meng, Linfeng Xu
{"title":"High efficiency deep image compression via channel-wise scale adaptive latent representation learning","authors":"Chenhao Wu,&nbsp;Qingbo Wu,&nbsp;King Ngi Ngan,&nbsp;Hongliang Li,&nbsp;Fanman Meng,&nbsp;Linfeng Xu","doi":"10.1016/j.image.2024.117227","DOIUrl":"10.1016/j.image.2024.117227","url":null,"abstract":"<div><div>Recent learning based neural image compression methods have achieved impressive rate–distortion (RD) performance via the sophisticated context entropy model, which performs well in capturing the spatial correlations of latent features. However, due to the dependency on the adjacent or distant decoded features, existing methods require an inefficient serial processing structure, which significantly limits its practicability. Instead of pursuing computationally expensive entropy estimation, we propose to reduce the spatial redundancy via the channel-wise scale adaptive latent representation learning, whose entropy coding is spatially context-free and parallelizable. Specifically, the proposed encoder adaptively determines the scale of the latent features via a learnable binary mask, which is optimized with the RD cost. In this way, lower-scale latent representation will be allocated to the channels with higher spatial redundancy, which consumes fewer bits and vice versa. The downscaled latent features could be well recovered with a lightweight inter-channel upconversion module in the decoder. To compensate for the entropy estimation performance degradation, we further develop an inter-scale hyperprior entropy model, which supports the high efficiency parallel encoding/decoding within each scale of the latent features. Extensive experiments are conducted to illustrate the efficacy of the proposed method. Our method achieves bitrate savings of 18.23%, 19.36%, and 27.04% over HEVC Intra, along with decoding speeds that are 46 times, 48 times, and 51 times faster than the baseline method on the Kodak, Tecnick, and CLIC datasets, respectively.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117227"},"PeriodicalIF":3.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text in the dark: Extremely low-light text image enhancement 黑暗中的文字:极低照度下的文字图像增强
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-10-28 DOI: 10.1016/j.image.2024.117222
Che-Tsung Lin , Chun Chet Ng , Zhi Qin Tan , Wan Jun Nah , Xinyu Wang , Jie Long Kew , Pohao Hsu , Shang Hong Lai , Chee Seng Chan , Christopher Zach
{"title":"Text in the dark: Extremely low-light text image enhancement","authors":"Che-Tsung Lin ,&nbsp;Chun Chet Ng ,&nbsp;Zhi Qin Tan ,&nbsp;Wan Jun Nah ,&nbsp;Xinyu Wang ,&nbsp;Jie Long Kew ,&nbsp;Pohao Hsu ,&nbsp;Shang Hong Lai ,&nbsp;Chee Seng Chan ,&nbsp;Christopher Zach","doi":"10.1016/j.image.2024.117222","DOIUrl":"10.1016/j.image.2024.117222","url":null,"abstract":"<div><div>Extremely low-light text images pose significant challenges for scene text detection. Existing methods enhance these images using low-light image enhancement techniques before text detection. However, they fail to address the importance of low-level features, which are essential for optimal performance in downstream scene text tasks. Further research is also limited by the scarcity of extremely low-light text datasets. To address these limitations, we propose a novel, text-aware extremely low-light image enhancement framework. Our approach first integrates a Text-Aware Copy-Paste (Text-CP) augmentation method as a preprocessing step, followed by a dual-encoder–decoder architecture enhanced with Edge-Aware attention modules. We also introduce text detection and edge reconstruction losses to train the model to generate images with higher text visibility. Additionally, we propose a Supervised Deep Curve Estimation (Supervised-DCE) model for synthesizing extremely low-light images, allowing training on publicly available scene text datasets such as IC15. To further advance this domain, we annotated texts in the extremely low-light See In the Dark (SID) and ordinary LOw-Light (LOL) datasets. The proposed framework is rigorously tested against various traditional and deep learning-based methods on the newly labeled SID-Sony-Text, SID-Fuji-Text, LOL-Text, and synthetic extremely low-light IC15 datasets. Our extensive experiments demonstrate notable improvements in both image enhancement and scene text tasks, showcasing the model’s efficacy in text detection under extremely low-light conditions. Code and datasets will be released publicly at <span><span>https://github.com/chunchet-ng/Text-in-the-Dark</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"130 ","pages":"Article 117222"},"PeriodicalIF":3.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信