Image and Vision Computing最新文献

筛选
英文 中文
Dynamic feature extraction and histopathology domain shift alignment for mitosis detection 有丝分裂检测的动态特征提取和组织病理学域移位对齐
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-14 DOI: 10.1016/j.imavis.2025.105541
Jiangxiao Han, Shikang Wang, Lianjun Wu, Wenyu Liu
{"title":"Dynamic feature extraction and histopathology domain shift alignment for mitosis detection","authors":"Jiangxiao Han,&nbsp;Shikang Wang,&nbsp;Lianjun Wu,&nbsp;Wenyu Liu","doi":"10.1016/j.imavis.2025.105541","DOIUrl":"10.1016/j.imavis.2025.105541","url":null,"abstract":"<div><div>Mitosis count is of crucial significance in cancer diagnosis; therefore, mitosis detection is a meaningful subject in medical image studies. The challenge of mitosis detection lies in the intra-class variance of mitosis and hard negatives, i.e., the sizes/ shapes of mitotic cells vary considerably and plenty of non-mitotic cells resemble mitosis, and the histopathology domain shift across datasets caused by different tissues and organs, scanners, labs, etc. In this paper, we propose a novel Domain Generalized Dynamic Mitosis Detector (DGDMD) to handle the intra-class variance and histopathology domain shift of mitosis detection with a dynamic mitosis feature extractor based on residual structured depth-wise convolution and domain shift alignment terms. The proposed dynamic mitosis feature extractor handles the intra-class variance caused by different sizes and shapes of mitotic cells as well as non-mitotic hard negatives. The proposed domain generalization schedule implemented via novel histopathology-mitosis domain shift alignments deals with the domain shift between histopathology slides in training and test datasets from different sources. We validate the domain generalization ability for mitosis detection of our algorithm on the MIDOG++ dataset and typical mitosis datasets, including the MIDOG 2021, ICPR MITOSIS 2014, AMIDA 2013, and TUPAC 16. Experimental results show that we achieve state-of-the-art (SOTA) performance on the MIDOG++ dataset for the domain generalization across tissue and organs of mitosis detection, across scanners on the MIDOG 2021 dataset, and across data sources on external datasets, demonstrating the effectiveness of our proposed method on the domain generalization of mitosis detection.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105541"},"PeriodicalIF":4.2,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated dual CNN-based feature extraction with SMOTE for imbalanced diabetic retinopathy classification 基于SMOTE的自动双cnn特征提取用于不平衡糖尿病视网膜病变分类
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-12 DOI: 10.1016/j.imavis.2025.105537
Danyal Badar Soomro , Wang ChengLiang , Mahmood Ashraf , Dina Abdulaziz AlHammadi , Shtwai Alsubai , Carlo Medaglia , Nisreen Innab , Muhammad Umer
{"title":"Automated dual CNN-based feature extraction with SMOTE for imbalanced diabetic retinopathy classification","authors":"Danyal Badar Soomro ,&nbsp;Wang ChengLiang ,&nbsp;Mahmood Ashraf ,&nbsp;Dina Abdulaziz AlHammadi ,&nbsp;Shtwai Alsubai ,&nbsp;Carlo Medaglia ,&nbsp;Nisreen Innab ,&nbsp;Muhammad Umer","doi":"10.1016/j.imavis.2025.105537","DOIUrl":"10.1016/j.imavis.2025.105537","url":null,"abstract":"<div><div>The primary cause of Diabetic Retinopathy (DR) is high blood sugar due to long-term diabetes. Early and correct diagnosis of the DR is essential for timely and effective treatment. Despite high performance of recently developed models, there is still a need to overcome the problem of class imbalance issues and feature extraction to achieve accurate results. To resolve this problem, we have presented an automated model combining the customized ResNet-50 and EfficientNetB0 for detecting and classifying DR in fundus images. The proposed model addresses class imbalance using data augmentation and Synthetic Minority Oversampling Technique (SMOTE) for oversampling the training data and enhances the feature extraction process through fine-tuned ResNet50 and EfficientNetB0 models with ReLU activations and global average pooling. Combining extracted features and then passing it to four different classifiers effectively captures both local and global spatial features, thereby improving classification accuracy for diabetic retinopathy. For Experiment, The APTOS 2019 Dataset is used, and it contains of 3662 high-quality fundus images. The performance of the proposed model is assessed using several metrics, and the findings are compared with contemporary methods for diabetic retinopathy detection. The suggested methodology demonstrates substantial enhancement in diabetic retinopathy diagnosis for fundus pictures. The proposed automated model attained an accuracy of 98.5% for binary classification and 92.73% for multiclass classification.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105537"},"PeriodicalIF":4.2,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFC-Net: Amodal instance segmentation with multi-path fusion and context-awareness MFC-Net:具有多路径融合和上下文感知的模态实例分割
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-12 DOI: 10.1016/j.imavis.2025.105539
Yunfei Yang , Hongwei Deng , Yichun Wu
{"title":"MFC-Net: Amodal instance segmentation with multi-path fusion and context-awareness","authors":"Yunfei Yang ,&nbsp;Hongwei Deng ,&nbsp;Yichun Wu","doi":"10.1016/j.imavis.2025.105539","DOIUrl":"10.1016/j.imavis.2025.105539","url":null,"abstract":"<div><div>Amodal instance segmentation refers to sensing the entire instance in an image, thereby segmenting the visible parts of an object and the regions that may be masked. However, existing amodal instance segmentation methods predict rough mask edges and perform poorly in segmenting objects with significant size differences. In addition, the occlusion environment greatly limits the performance of the model. To address the above problems, this work proposes an amodal instance segmentation method called MFC-Net to accurately segment objects in an image. For the rough prediction of mask edges, the model introduces the multi-path transformer structure to obtain finer object semantic features and boundary information, which improves the accuracy of edge region segmentation. For the problem of poor segmentation of object instances with significant size differences, we design an adaptive feature fusion module AFF, which dynamically captures the scale changes related to object size and fuses the multi-scale semantic feature information, so that the model obtains a receptive field adapted to the object size. To address the poor performance of segmentation in the occlusion environment, we designed the context-aware mask segmentation module CMS in the prediction module to make a preliminary prediction of the object’s amodal region. The module enhances the amodal perception of the model by modeling the long-range dependencies of the objects and capturing the contextual information of the occluded part of the object. Compared with the state-of-the-art methods, the MFC-Net proposed in this paper achieves a mAP of 73.3% on the D2SA dataset and 33.9% and 36.9% on the KINS and COCOA-cls datasets, respectively. Moreover, MFC-Net can produce complete and detailed amodal masks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105539"},"PeriodicalIF":4.2,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143828331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution 用于光场图像超分辨率的快速傅立叶卷积混合注意力转换器
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-10 DOI: 10.1016/j.imavis.2025.105542
Zhicheng Ma , Yuduo Guo , Zhaoxiang Liu , Shiguo Lian , Sen Wan
{"title":"Hybrid Attention Transformers with fast Fourier convolution for light field image super-resolution","authors":"Zhicheng Ma ,&nbsp;Yuduo Guo ,&nbsp;Zhaoxiang Liu ,&nbsp;Shiguo Lian ,&nbsp;Sen Wan","doi":"10.1016/j.imavis.2025.105542","DOIUrl":"10.1016/j.imavis.2025.105542","url":null,"abstract":"<div><div>The limited spatial resolution of light field (LF) cameras has hindered their widespread adoption, emphasizing the critical need for superresolution techniques to improve their practical use. Transformer-based methods, such as LF-DET, have shown potential in enhancing light field spatial super-resolution (LF-SR). However, LF-DET, which employs a spatial-angular separable transformer encoder with sub-sampling spatial and multiscale angular modeling for global context interaction, struggles to effectively capture global context in early layers and local details. In this work, we introduce LF-HATF, a novel network that builds on the LF-DET framework and incorporates Fast Fourier Convolution (FFC) and Hybrid Attention Transformers (HATs) to address these limitations. This integration enables LF-HATF to better capture both global and local information, significantly improving the restoration of edge details and textures, and providing a more comprehensive understanding of complex scenes. Additionally, we propose the Light Field Charbonnier loss function, designed to balance differential distributions across various LF views. This function minimizes errors both within the same perspective and across different views, further enhancing the model’s performance. Our evaluation on five public LF datasets demonstrates that LF-HATF outperforms existing methods, representing a significant advancement in LF-SR technology. This progress pushes the field forward and opens new avenues for research in light field imaging, unlocking the full potential of light field cameras.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105542"},"PeriodicalIF":4.2,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143828330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controlling vision-language model for enhancing image restoration 控制视觉语言模型增强图像复原
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-08 DOI: 10.1016/j.imavis.2025.105538
Mingwen Shao, Weihan Liu, Qiwang Li, Lingzhuang Meng, Yecong Wan
{"title":"Controlling vision-language model for enhancing image restoration","authors":"Mingwen Shao,&nbsp;Weihan Liu,&nbsp;Qiwang Li,&nbsp;Lingzhuang Meng,&nbsp;Yecong Wan","doi":"10.1016/j.imavis.2025.105538","DOIUrl":"10.1016/j.imavis.2025.105538","url":null,"abstract":"<div><div>Restoring low-quality images to their original high-quality state remains a significant challenge due to inherent uncertainties, particularly in blind image restoration scenarios where the nature of degradation is unknown. Despite recent advances, many restoration techniques still grapple with robustness and adaptability across diverse degradation conditions. In this paper, we introduce an approach to augment the restoration model by exploiting the robust prior features of CLIP, a large-scale vision-language model, to enhance its proficiency in handling a broader spectrum of degradation tasks. We integrate the robust priors from CLIP into the pre-trained image restoration model via cross-attention mechanisms, and we design a Prior Adapter to modulate these features, thereby enhancing the model’s restoration performance. Additionally, we introduce an innovative prompt learning framework that harnesses CLIP’s multimodal alignment capabilities to fine-tune pre-trained restoration models. Furthermore, we utilize CLIP’s contrastive loss to ensure that the restored images align more closely with the prompts of clean images in CLIP’s latent space, thereby improving the quality of the restoration. Through comprehensive experiments, we demonstrate the effectiveness and robustness of our method, showcasing its superior adaptability to a wide array of degradation tasks. Our findings emphasize the potential of integrating vision-language models such as CLIP to advance the cutting-edge in image restoration.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105538"},"PeriodicalIF":4.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wave-based cross-phase representation for weakly supervised classification 弱监督分类的基于波的交叉相位表示
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-05 DOI: 10.1016/j.imavis.2025.105527
Heng Zhou , Ping Zhong
{"title":"Wave-based cross-phase representation for weakly supervised classification","authors":"Heng Zhou ,&nbsp;Ping Zhong","doi":"10.1016/j.imavis.2025.105527","DOIUrl":"10.1016/j.imavis.2025.105527","url":null,"abstract":"<div><div>Weakly Supervised Learning (WSL) aims to improve model robustness and manage label uncertainty, but current methods struggle to handle various weak label sources, such as incomplete and noisy labels. Additionally, these methods struggle with a lack of adaptability from reliance on prior knowledge and the complexity of managing data-label dependencies. To address these problems, we propose a wave-based cross-phase network (WCPN) to enhance adaptability for incomplete and noisy labels. Specifically, we expand wave representations and design a cross-phase token mixing (CPTM) module to refine feature relationships and integrate strategies for various weak labels. The proposed CPFE algorithm in the CPTM optimizes feature relationships by using self-interference and mutual-interference to process phase information between feature tokens, thus enhancing semantic consistency and discriminative ability. Furthermore, by employing a data-driven tri-branch structure and maximizing mutual information between features and labels, WCPN effectively overcomes the inflexibility caused by reliance on prior knowledge and complex data-label dependencies. In this way, WCPN leverages wave representations to enhance feature interactions, capture data complexity and diversity, and improve feature compactness for specific categories. Experimental results demonstrate that WCPN excels across various supervision levels and consistently outperforms existing advanced methods. It effectively handles noisy and incomplete labels, showing remarkable adaptability and enhanced feature understanding.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105527"},"PeriodicalIF":4.2,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual region mutual enhancement network for camouflaged object detection 伪装目标检测的双区域互增强网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-05 DOI: 10.1016/j.imavis.2025.105526
Chao Yin, Xiaoqiang Li
{"title":"Dual region mutual enhancement network for camouflaged object detection","authors":"Chao Yin,&nbsp;Xiaoqiang Li","doi":"10.1016/j.imavis.2025.105526","DOIUrl":"10.1016/j.imavis.2025.105526","url":null,"abstract":"<div><div>Camouflaged Object Detection (COD) is a promising yet challenging task that aims to segment objects hidden in intricate surroundings. Current methods often struggle with identifying background regions that resemble camouflaged objects, posing a significant challenge. To mitigate this issue, we propose a novel Dual Region Mutual Enhancement Network (DRMENet), which separately extracts camouflaged object and background region features and these branches mutually assist each other to refine their respective region features. Specifically, in the foreground segmentation branch, we utilize the Background-assisted Foreground Region Enhancement (BFRE) subnetwork to enhance camouflaged object region features with background information. BFRE subnetwork consists of two parts: the Background-subtracted Foreground Refinement (BFR) module and the Scale-wise Feature Capturing (SFC) module, where the former obtains corresponding camouflaged object region features through cross-layer refinement with the assistance of background region features, and the latter captures scale-wise features and outputs a side output for region prediction result. Additionally, considering the noise present in low-level visual features, we introduce the Semantic-Guided Refinement (SGR) module, which progressively refines visual features based on enhanced semantic features. Experiments on challenging datasets show DRMENet’s superiority over the existing state-of-the-art methods. The source codes will be available at <span><span>https://github.com/ycyinchao/DRMENet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105526"},"PeriodicalIF":4.2,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepNet: Protection of deepfake images with aid of deep learning networks DeepNet:借助深度学习网络保护deepfake图像
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-02 DOI: 10.1016/j.imavis.2025.105540
Divyanshu Awasthi , Priyank Khare , Vinay Kumar Srivastava , Amit Kumar Singh , Brij B. Gupta
{"title":"DeepNet: Protection of deepfake images with aid of deep learning networks","authors":"Divyanshu Awasthi ,&nbsp;Priyank Khare ,&nbsp;Vinay Kumar Srivastava ,&nbsp;Amit Kumar Singh ,&nbsp;Brij B. Gupta","doi":"10.1016/j.imavis.2025.105540","DOIUrl":"10.1016/j.imavis.2025.105540","url":null,"abstract":"<div><div>In the present information age, multimedia security has become a challenging task. Especially increased usage of images as multimedia data has been a key aspect in this digital transmission era. Deep fake detection of images is a real-time problem which needs to be focused. To resolve this challenge, a novel deep fake detection algorithm is proposed in this article. The presented research uses the Viola-Jones detection algorithm for efficient deep fake image detection. To protect the integrity of these images, the multiresolution domain approach is effectively utilized with redundant discrete wavelet transform (RDWT) and multiresolution singular value decomposition (MSVD). Discrete cosine transform (DCT) is applied for the extraction of frequency components. An adaptive neuro-fuzzy inference system (ANFIS)-based optimization is applied to attain the optimum weighing factor (WF). This WF exhibits a better trade-off among attributes of watermarking. Furthermore, authentication is successfully implemented with the aid of various deep learning models such as SqueezeNet, EfficientNet-B0, ResNet-50 and InceptionV3. This implementation explores the various aspects related to the ownership assertion. Analysis of comprehensive simulation results depicts the effectiveness of the proposed technique over different prevailing techniques. With the development of the proposed technique, deep fake image detection can easily be realized and safeguards the images. The average percentage improvement in the imperceptibility of the proposed technique is 52.14% and for robustness is 7.51%.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105540"},"PeriodicalIF":4.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards on-device continual learning with Binary Neural Networks in industrial scenarios 在工业场景中使用二元神经网络实现设备上的持续学习
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-01 DOI: 10.1016/j.imavis.2025.105524
Lorenzo Vorabbi , Angelo Carraggi , Davide Maltoni , Guido Borghi , Stefano Santi
{"title":"Towards on-device continual learning with Binary Neural Networks in industrial scenarios","authors":"Lorenzo Vorabbi ,&nbsp;Angelo Carraggi ,&nbsp;Davide Maltoni ,&nbsp;Guido Borghi ,&nbsp;Stefano Santi","doi":"10.1016/j.imavis.2025.105524","DOIUrl":"10.1016/j.imavis.2025.105524","url":null,"abstract":"<div><div>This paper addresses the challenges of deploying deep learning models, specifically Binary Neural Networks (BNNs), on resource-constrained embedded devices within the Internet of Things context. As deep learning continues to gain traction in IoT applications, the need for efficient models that can learn continuously from incremental data streams without requiring extensive computational resources has become more pressing. We propose a solution that integrates Continual Learning with BNNs, utilizing replay memory to prevent catastrophic forgetting. Our method focuses on quantized neural networks, introducing the quantization also for the backpropagation step, significantly reducing memory and computational requirements. Furthermore, we enhance the replay memory mechanism by storing intermediate feature maps (<em>i.e.</em> latent replay) with 1-bit precision instead of raw data, enabling efficient memory usage. In addition to well-known benchmarks, we introduce the DL-Hazmat dataset, which consists of over 140k high-resolution grayscale images of 64 hazardous material symbols. Experimental results show a significant improvement in model accuracy and a substantial reduction in memory requirements, demonstrating the effectiveness of our method in enabling deep learning applications on embedded devices in real-world scenarios. Our work expands the application of Continual Learning and BNNs for efficient on-device training, offering a promising solution for IoT and other resource-constrained environments.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105524"},"PeriodicalIF":4.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
I3Net: Intensive information interaction network for RGB-T salient object detection I3Net: RGB-T显著目标检测的密集信息交互网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-04-01 DOI: 10.1016/j.imavis.2025.105525
Jia Hou , Hongfa Wen , Shuai Wang , Chenggang Yan
{"title":"I3Net: Intensive information interaction network for RGB-T salient object detection","authors":"Jia Hou ,&nbsp;Hongfa Wen ,&nbsp;Shuai Wang ,&nbsp;Chenggang Yan","doi":"10.1016/j.imavis.2025.105525","DOIUrl":"10.1016/j.imavis.2025.105525","url":null,"abstract":"<div><div>Multi-modality salient object detection (SOD) is receiving more and more attention in recent years. Infrared thermal images can provide useful information in extreme situations, such as low illumination and cluttered background. Accompany with extra information, we need a more delicate design to properly integrate multi-modal and multi-scale clues. In this paper, we propose an intensively information interaction network (I<sup>3</sup>Net) to perform Red-Green-Blue and Thermal (RGB-T) SOD, which optimizes the performance through modality interaction, level interaction, and scale interaction. Firstly, feature channels from different sources are dynamically selected according to the modality interaction with dynamic merging module. Then, adjacent level interaction is conducted under the guidance of coordinate channel and spatial attention with spatial feature aggregation module. Finally, we deploy pyramid attention module to obtain a more comprehensive scale interaction. Extensive experiments on four RGB-T datasets, VT821, VT1000, VT5000 and VI-RGBT3500, show that the proposed I<sup>3</sup>Net achieves a competitive and excellent performance against 13 state-of-the-art methods in multiple evaluation metrics, with a 1.70%, 1.41%, and 1.54% improvement in terms of weighted F-measure, mean E-measure, and S-measure.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105525"},"PeriodicalIF":4.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信