Feng Gao , Yongge Liu , Deng Li , Xu Chen , Runhua Jiang , Yahong Han
{"title":"Information disentanglement for unsupervised domain adaptive Oracle Bone Inscriptions detection","authors":"Feng Gao , Yongge Liu , Deng Li , Xu Chen , Runhua Jiang , Yahong Han","doi":"10.1016/j.image.2025.117334","DOIUrl":"10.1016/j.image.2025.117334","url":null,"abstract":"<div><div>The detection of Oracle Bone Inscriptions (OBIs) is the foundation of studying the OBIs via computer technology. Oracle bone inscription data includes rubbings, handwriting, and photos. Currently, most detection methods primarily focus on rubbings and rely on large-scale annotated datasets. However, it is necessary to detect oracle bone inscriptions on both handwriting and photo domains in practical applications. Additionally, annotating handwriting and photos is time-consuming and requires expert knowledge. An effective solution is to directly transfer the knowledge learned from the existing public dataset to the unlabeled target domain. However, the domain shift between domains heavily degrades the performance of this solution. To alleviate this problem and based on the characteristics of different domains of oracle bone, in this paper, we propose an information disentanglement method for the Unsupervised Domain Adaptive (UDA) OBIs detection to improve the detection performance of OBIs in both handwriting and photos. Specifically, we construct an image content encoder and a style encoder module to decouple the oracle bone image information. Then, a reconstruction decoder is constructed to reconstruct the source domain image guided by the target domain image information to reduce the shift between domains. To demonstrate the effectiveness of our method, we constructed an OBI detection benchmark that contains three domains: rubbing, handwriting, and photo. Extensive experiments verified the effectiveness and generality of our method on domain adaptive OBIs detection. Compared to other state-of-the-art UDAOD methods, our approach achieves an improvement of 0.5% and 0.6% in mAP for handwriting and photos, respectively.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117334"},"PeriodicalIF":3.4,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143902465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Li , Jimin Xiao , Mingjie Sun , Eng Gee Lim , Yao Zhao
{"title":"Auxiliary captioning: Bridging image–text matching and image captioning","authors":"Hui Li , Jimin Xiao , Mingjie Sun , Eng Gee Lim , Yao Zhao","doi":"10.1016/j.image.2025.117337","DOIUrl":"10.1016/j.image.2025.117337","url":null,"abstract":"<div><div>The image–text matching task, where one query image (text) is provided to seek its corresponding text (image) in a gallery, has drawn increasing attention recently. Conventional methods try to directly map the image and text to one latent-aligned feature space for matching. Achieving an ideal feature alignment is arduous due to the fact that the significant content of the image is not highlighted. To overcome this limitation, we propose to use an auxiliary captioning step to enhance the image feature, where the image feature is fused with the text feature of the captioning output. In this way, the captioning output feature, sharing similar space distribution with candidate texts, can provide high-level semantic information to facilitate locating the significant content in an image. To optimize the auxiliary captioning output, we introduce a new metric, Caption-to-Text (C2T), representing the retrieval performance between the auxiliary captioning output and the ground-truth matching texts. By integrating our C2T score as a reward in our image captioning reinforcement learning framework, our image captioning model can generate more suitable sentences for the auxiliary image–text matching. Extensive experiments on MSCOCO and Flickr30k demonstrate our method’s superiority, which achieves absolute improvements of 5.7% (R@1) on Flickr30k and 3.2% (R@1) on MSCOCO over baseline approaches, outperforming state-of-the-art models without complex architectural modifications.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117337"},"PeriodicalIF":3.4,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143917402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing the complexity of distributed video coding by improving the image enhancement post processing","authors":"Djamel Eddine Boudechiche , Said Benierbah","doi":"10.1016/j.image.2025.117339","DOIUrl":"10.1016/j.image.2025.117339","url":null,"abstract":"<div><div>The main attractive feature of distributed video coding (DVC) is its use of low-complexity encoders, which are required by low-resource networked applications. Unfortunately, the performance of the currently proposed DVC systems is not yet convincing, and further improvements in the rate, distortion, and complexity tradeoff of DVC are necessary to make it more attractive for use in practical applications. This requires finding new ways to exploit side information in reducing the transmitted rate and improving the quality of the decoded frames. This paper proposes improving DVC by exploiting image enhancement post-processing at the decoder. In this way, we can either improve the quality of the decoded frames for a given rate or reduce the number of transmitted bits for the same quality and hence reduce the complexity of the encoder. To do this, we used a conditional generative adversarial network (cGAN) to restore more of the details discarded by quantization, with the help of side information. We also evaluated numerous existing deep learning-based enhancement methods for DVC and compared them to our proposed model. The results show a reduction in the number of DVC coding operations by 46 % and an improvement in rate-distortion performance and subjective visual quality. Furthermore, despite reducing its complexity, our DVC codec outperformed the DISCOVER codec with an average Bjøntegaard PSNR of 0.925 dB.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117339"},"PeriodicalIF":3.4,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143899898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng He , Lin Zhang , Yu Yang , Yue Zhou , Shukai Duan , Xiaofang Hu
{"title":"MA-MNN: Multi-flow attentive memristive neural network for multi-task image restoration","authors":"Peng He , Lin Zhang , Yu Yang , Yue Zhou , Shukai Duan , Xiaofang Hu","doi":"10.1016/j.image.2025.117336","DOIUrl":"10.1016/j.image.2025.117336","url":null,"abstract":"<div><div>Images taken in rainy, hazy, and low-light environments severely hinder the performance of outdoor computer vision systems. Most data-driven image restoration methods are task-specific and computationally intensive, whereas the capture and processing of degraded images occur largely in end-side devices with limited computing resources. Motivated by addressing the above issues, a novel software and hardware co-designed image restoration method named multi-flow attentive memristive neural network (MA-MNN) is proposed in this paper, which combines a deep learning algorithm and the nanoscale device memristor. The multi-level complementary spatial contextual information is exploited by the multi-flow aggregation block. The dense connection design is adopted to provide smooth transportation across units and alleviate the vanishing-gradient. The supervised calibration block is designed to facilitate achieving the dual-attention mechanism that helps the model identify and re-calibrate the transformed features. Besides, a hardware implementation scheme based on memristors is designed to provide low energy consumption solutions for embedded applications. Extensive experiments in image deraining, image dehazing and low-light image enhancement have shown that the proposed method is highly competitive with over 20 state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117336"},"PeriodicalIF":3.4,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143899899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An unsupervised fusion method for infrared and visible image under low-light condition based on Generative Adversarial Networks","authors":"Shuai Yang, Yuan Gao, Shiwei Ma","doi":"10.1016/j.image.2025.117324","DOIUrl":"10.1016/j.image.2025.117324","url":null,"abstract":"<div><div>The aim of fusing infrared and visible images is to achieve high-quality images by enhancing textural details and obtaining complementary benefits. However, the existing methods for fusing infrared and visible images are suitable only normal lighting scenes. The details of the visible image under low-light conditions are not discernible. Achieving complementarity between the image contours and textural details is challenging between the infrared image and the visible image. With the intention of addressing the challenge of poor quality of infrared and visible light fusion images under low light conditions, a novel unsupervised fusion method for infrared and visible image under low_light condition (referred to as UFIVL) is presented in this paper. Specifically, the proposed method effectively enhances the low-light regions of visible light images while reducing noise. To incorporate style features of the image into the reconstruction of content features, a sparse-connection dense structure is designed. An adaptive contrast-limited histogram equalization loss function is introduced to improve contrast and brightness in the fused image. The joint gradient loss is proposed to extract clearer texture features under low-light conditions. This end-to-end method generates fused images with enhanced contrast and rich details. Furthermore, considering the issues in existing public datasets, a dataset for individuals and objects in low-light conditions (LLHO <span><span>https://github.com/alex551781/LLHO</span><svg><path></path></svg></span>) is proposed. On the ground of the experimental results, we can conclude that the proposed method generates fusion images with higher subjective and objective quantification scores on both the LLVIP public dataset and the LLHO self-built dataset. Additionally, we apply the fusion images generated by UFIVL method to the advanced computer vision task of target detection, resulting in a significant improvement in detection performance.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117324"},"PeriodicalIF":3.4,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-fish tracking with underwater image enhancement by deep network in marine ecosystems","authors":"Prerana Mukherjee , Srimanta Mandal , Koteswar Rao Jerripothula , Vrishabhdhwaj Maharshi , Kashish Katara","doi":"10.1016/j.image.2025.117321","DOIUrl":"10.1016/j.image.2025.117321","url":null,"abstract":"<div><div>Tracking marine life plays a crucial role in understanding migration patterns, movements, and population growth of underwater species. Deep learning-based fish-tracking networks have been actively researched and developed, yielding promising results. In this work, we propose an end-to-end deep learning framework for tracking fish in unconstrained marine environments. The core innovation of our approach is a Siamese-based architecture integrated with an image enhancement module, designed to measure appearance similarity effectively. The enhancement module consists of convolutional layers and a squeeze-and-excitation block, pre-trained on degraded and clean image pairs to address underwater distortions. This enhanced feature representation is leveraged within the Siamese framework to compute an appearance similarity score, which is further refined using prediction scores based on fish movement patterns. To ensure robust tracking, we combine the appearance similarity score, prediction score, and IoU-based similarity score to generate fish trajectories using the Hungarian algorithm. Our framework significantly reduces ID switches by 35.6% on the Fish4Knowledge dataset and 3.8% on the GMOT-40 fish category, all while maintaining high tracking accuracy. The source code of this work is available here: <span><span>https://github.com/srimanta-mandal/Multi-Fish-Tracking-with-Underwater-Image-Enhancement</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117321"},"PeriodicalIF":3.4,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An adaptive contextual learning network for image inpainting","authors":"Feilong Cao , Xinru Shao , Rui Zhang , Chenglin Wen","doi":"10.1016/j.image.2025.117326","DOIUrl":"10.1016/j.image.2025.117326","url":null,"abstract":"<div><div>Deep-learning-based methods for image inpainting have been intensively researched because of deep neural networks’ powerful approximation capabilities. In particular, the context-reasoning-based methods have shown significant success. Nonetheless, images generated using these methods tend to suffer from visually inappropriate content. This is due to the fact that their context reasoning processes are weakly adaptive, limiting the flexibility of generation. To this end, this paper presents an adaptive contextual learning network (ACLNet) for image inpainting. The main contribution of the proposed method is to significantly improve the adaptive capability of the context reasoning. The method can adaptively weigh the importance of known contexts for filling missing regions, ensuring that the filled content is finely filtered rather than raw, which improves the reliability of the generated content. Specifically, a modular hybrid dilated residual unit and an adaptive region affinity learning attention are created, which can adaptively choose and aggregate contexts based on the sample itself through gating mechanism and similarity filtering respectively. The extensive comparisons reveal that ACLNet exceeds the state-of-the-art, improving peak signal-to-noise ratio (PSNR) by 0.25 dB and structural similarity index measure (SSIM) by 0.017 on average and that it can generate more aesthetically realistic images than other approaches. The implemented ablation experiments also confirm the effectiveness of ACLNet.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117326"},"PeriodicalIF":3.4,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Qi , Can Su , Xiaoxuan Hu , Mengwei Chen , Yanfei Sun , Zhenjiang Dong , Tianliang Liu , Jiebo Luo
{"title":"AMFMER: A multimodal full transformer for unifying aesthetic assessment tasks","authors":"Jin Qi , Can Su , Xiaoxuan Hu , Mengwei Chen , Yanfei Sun , Zhenjiang Dong , Tianliang Liu , Jiebo Luo","doi":"10.1016/j.image.2025.117320","DOIUrl":"10.1016/j.image.2025.117320","url":null,"abstract":"<div><div>Computational aesthetics aims to simulate the human visual perception process via the computers to automatically evaluate aesthetic quality with automatic methods. This topic has been widely studied by numerous researchers. However, existing research mostly focuses on image content while disregarding high-level semantics in the related image comments. In addition, most major assessment methods are based on convolutional neural networks (CNNs) for learning the distinctive features, which lack representational power and modeling capabilities for multimodal assessment requirement. Furthermore, many transformer-based model approaches suffer from limited information flow between different parts of the assumed model, and many multimodal fusion methods are used to extract image features and text features, and cannot handle multi-modal information well. Inspired by the above questions, in this paper, A novel Multimodal Full transforMER (AMFMER) evaluation model without aesthetic style information is proposed, consisting of three components: visual stream, textual stream and multimodal fusion layer. Firstly, the visual stream exploits the improved Swin transformer to extract the distinctive layer features of the input image. Secondly, the textual stream is based on the robustly optimized bidirectional encoder representations from transformers (RoBERTa) text encoder to extract semantic information from the corresponding comments. Thirdly, the multimodal fusion layer fuses visual features, textual features and low-layer salient features in a cross-attention manner to extract the multimodal distinctive features. Experimental results show that the proposed AMFMER approach in this paper outperforms current mainstream methods in a unified aesthetic prediction task, especially in terms of the correlation between the objective model evaluation and subjective human evaluation.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117320"},"PeriodicalIF":3.4,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiale He , Qunbing Xia , Gaobo Yang , Xiangling Ding
{"title":"Higher-order motion calibration and sparsity based outlier correction for video FRUC","authors":"Jiale He , Qunbing Xia , Gaobo Yang , Xiangling Ding","doi":"10.1016/j.image.2025.117327","DOIUrl":"10.1016/j.image.2025.117327","url":null,"abstract":"<div><div>For frame rate up-conversion (FRUC), one of the key challenges is to deal with irregular and large motions that are widely existed in video scenes. However, most existing FRUC works make constant brightness and linear motion assumptions, easily leading to undesirable artifacts such as motion blurriness and frame flickering. In this work, we propose an advanced FRUC work by using a high-order model for motion calibration and a sparse sampling strategy for outlier correction. Unidirectional motion estimation is used to accurately locate object from the previous frame to the following frame in a coarse-to-fine pyramid structure. Then, object motion trajectory is fine-tuned to approximate real motion, and possible outlier regions are located and recorded. Moreover, image sparsity is exploited as the prior knowledge for outlier correction, and the outlier index map is used to design the measurement matrix. Based on the theory of sparse sampling, the outlier regions are reconstructed to eliminate the side effects such as overlapping, holes and blurring. Extensive experimental results demonstrate that the proposed approach outperforms the state-of-the-art FRUC works in terms of both objective and subjective qualities of interpolated frames.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117327"},"PeriodicalIF":3.4,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Zhu, Linxi Li, Mingwei Tang, Wenrui Niu, Jianhua Xie, Hongyun Mao
{"title":"FANet: Feature attention network for semantic segmentation","authors":"Lin Zhu, Linxi Li, Mingwei Tang, Wenrui Niu, Jianhua Xie, Hongyun Mao","doi":"10.1016/j.image.2025.117330","DOIUrl":"10.1016/j.image.2025.117330","url":null,"abstract":"<div><div>Semantic segmentation based on scene parsing specifies a category label for each pixel in the image. Existing neural network models are useful tools for understanding the objects in the scene. However, they ignore the heterogeneity of information carried by individual features, leading to pixel classification confusion and unclear boundaries. Therefore, this paper proposes a novel Feature Attention Network (FANet). Firstly, the adjustment algorithm is presented to capture attention feature matrices that can effectively cherry-pick feature dependencies. Secondly, the hybrid extraction module (HEM) is constructed to aggregate long-term dependencies based on proposed adjustment algorithm. Finally, the proposed adaptive hierarchical fusion module (AHFM) is employed to aggregated multi-scale features by learning spatially filtering conflictive information, which improves the scale invariance of features. Experimental results on popular Benchmarks (such as PASCAL VOC 2012, Cityscapes and ADE20K) indicate that our algorithm achieves better performance than other algorithms.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117330"},"PeriodicalIF":3.4,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}