Image and Vision Computing最新文献_第5页

RC-SODet: Reparameterized dual convolutions and compact feature enhancement for small object detector RC-SODet：小目标检测器的重参数化双卷积和紧凑特征增强

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-28 DOI: 10.1016/j.imavis.2025.105552

Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang

{"title":"RC-SODet: Reparameterized dual convolutions and compact feature enhancement for small object detector","authors":"Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang","doi":"10.1016/j.imavis.2025.105552","DOIUrl":"10.1016/j.imavis.2025.105552","url":null,"abstract":"<div><div>In the field of object detection, small object detection tasks have broad application prospects. However, detection models often face issues with insufficient image features for small objects and limited computational resources. To address these issues, we propose RC-SODet, a small object detector that uses reparameterization techniques combined with dual convolutions and compact feature enhancement blocks. In the detector, we design Reparameterized Dual Convolutions (RepDuConv) to replace conventional convolution and downsampling blocks. Its dual-branch advantage maintains accuracy, and the reparameterization technique built on this significantly improves inference efficiency. Compact Feature-enhanced Pyramid Network (RC-FPN) serves as the neck, using reparameterizable Cross Stage Partial with Feature Fusion Reparameterized Compact Blocks (C2fRCB) for feature enhancement. First, in the backbone network, RepDuConv replaces convolution blocks to perform downsampling on input images, thereby obtaining multi-scale features to pass to the neck. Second, the model uses RC-FPN as the feature pyramid neck to process multi-scale features from the backbone. After each front-end upsampling and fusion, dual-layer C2fRCB is applied to further refine and enhance the tensor features at different fusion scales. Finally, multi-level feature maps are fused at the back-end and passed to the detection head. Additionally, in the inference stage, both RepDuConv and C2fRCB optimize branch structures through reparameterization techniques. Experimental results show that on the small object datasets VisDrone and DroneVehicle, the highest version of RC-SODet achieves 48.1% and 82.4% mAP50, as well as 30.1% and 59.1% mAP50-95, respectively. The designed reparameterization technique increases the model inference speed by 58.1%.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105552"},"PeriodicalIF":4.2,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Corrigendum to “A novel framework for diverse video generation from a single video using frame-conditioned denoising diffusion probabilistic model and ConvNeXt-V2” [Image and Vision Computing 154 (2025) 105422] “使用帧条件去噪扩散概率模型和ConvNeXt-V2从单个视频生成不同视频的新框架”的勘误表[图像与视觉计算154 (2025)105422]

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-23 DOI: 10.1016/j.imavis.2025.105556

Ayushi Verma, Tapas Badal, Abhay Bansal

引用次数: 0

Rethinking the sample relations for few-shot classification 对小样本分类中样本关系的再思考

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-22 DOI: 10.1016/j.imavis.2025.105550

Guowei Yin , Sheng Huang , Luwen Huangfu , Yi Zhang , Xiaohong Zhang

{"title":"Rethinking the sample relations for few-shot classification","authors":"Guowei Yin , Sheng Huang , Luwen Huangfu , Yi Zhang , Xiaohong Zhang","doi":"10.1016/j.imavis.2025.105550","DOIUrl":"10.1016/j.imavis.2025.105550","url":null,"abstract":"<div><div>Feature quality is paramount for classification performance, particularly in few-shot scenarios. Contrastive learning, a widely adopted technique for enhancing feature quality, leverages sample relations to extract intrinsic features that capture semantic information and has achieved remarkable success in Few-Shot Learning (FSL). Nevertheless, current few-shot contrastive learning approaches often overlook the semantic similarity discrepancies at different granularities when employing the same modeling approach for different sample relations, which limits the potential of few-shot contrastive learning. In this paper, we introduce a straightforward yet effective contrastive learning approach, Multi-Grained Relation Contrastive Learning (MGRCL), as a pre-training feature learning model to boost few-shot learning by meticulously modeling sample relations at different granularities. MGRCL categorizes sample relations into three types: intra-sample relation of the same sample under different transformations, intra-class relation of homogeneous samples, and inter-class relation of inhomogeneous samples. In MGRCL, we design Transformation Consistency Learning (TCL) to ensure the rigorous semantic consistency of a sample under different transformations by aligning predictions of input pairs. Furthermore, to preserve discriminative information, we employ Class Contrastive Learning (CCL) to ensure that a sample is always closer to its homogeneous samples than its inhomogeneous ones, as homogeneous samples share similar semantic content while inhomogeneous samples have different semantic content. Our method is assessed across four popular FSL benchmarks, showing that such a simple pre-training feature learning method surpasses a majority of leading FSL methods. Moreover, our method can be incorporated into other FSL methods as the pre-trained model and help them obtain significant performance gains.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105550"},"PeriodicalIF":4.2,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exemplar-free class incremental action recognition based on self-supervised learning 基于自监督学习的无范例类增量动作识别

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-19 DOI: 10.1016/j.imavis.2025.105544

Chunyu Hou, Yonghong Hou, Jinyin Jiang, Gunel Abdullayeva

引用次数: 0

W-Net: A facial feature-guided face super-resolution network W-Net：人脸特征导向的人脸超分辨率网络

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-17 DOI: 10.1016/j.imavis.2025.105549

Hao Liu , Yang Yang , Yunxia Liu

{"title":"W-Net: A facial feature-guided face super-resolution network","authors":"Hao Liu , Yang Yang , Yunxia Liu","doi":"10.1016/j.imavis.2025.105549","DOIUrl":"10.1016/j.imavis.2025.105549","url":null,"abstract":"<div><div>Face Super-Resolution (FSR) aims to recover high-resolution (HR) face images from low-resolution (LR) ones. Despite the progress made by convolutional neural networks in FSR, the results of existing approaches are not ideal due to their low reconstruction efficiency and insufficient utilization of prior information. Considering that faces are highly structured objects, effectively leveraging facial priors to improve FSR results is a worthwhile endeavor. This paper proposes a novel network architecture called W-Net to address this challenge. W-Net leverages a meticulously designed Parsing Block to fully exploit the resolution potential of LR image. We use this parsing map as an attention prior, effectively integrating information from both the parsing map and LR images. Simultaneously, we perform multiple fusions across different latent representation dimensions through the W-shaped network structure combined with the LPF(<strong>L</strong>R-<strong>P</strong>arsing Map <strong>F</strong>usion Module). Additionally, we utilize a facial parsing graph as a mask, assigning different weights and loss functions to key facial areas to balance the performance of our reconstructed facial images between perceptual quality and pixel accuracy. We conducted extensive comparative experiments, not only limited to conventional facial super-resolution metrics but also extending to downstream tasks such as facial recognition and facial keypoint detection. The experiments demonstrate that W-Net exhibits outstanding performance in quantitative metrics, visual quality, and downstream tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105549"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing brain tumor classification in MRI images: A deep learning-based approach for accurate diagnosis 增强MRI图像中的脑肿瘤分类：一种基于深度学习的准确诊断方法

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-17 DOI: 10.1016/j.imavis.2025.105555

Hossein Sadr , Mojdeh Nazari , Shahrokh Yousefzadeh-Chabok , Hassan Emami , Reza Rabiei , Ali Ashraf

{"title":"Enhancing brain tumor classification in MRI images: A deep learning-based approach for accurate diagnosis","authors":"Hossein Sadr , Mojdeh Nazari , Shahrokh Yousefzadeh-Chabok , Hassan Emami , Reza Rabiei , Ali Ashraf","doi":"10.1016/j.imavis.2025.105555","DOIUrl":"10.1016/j.imavis.2025.105555","url":null,"abstract":"<div><h3>Background</h3><div>Detecting brain tumors from MRI images is crucial for early intervention, accurate diagnosis, and effective treatment planning. MRI imaging offers detailed information about the location, size, and characteristics of brain tumors which enables healthcare professionals to make decisions considering treatment options such as surgery, radiation therapy, and chemotherapy. However, this process is time-consuming and demands specialized expertise to manually assess MRI images. Presently, advancements in Computer-Aided Diagnosis (CAD), machine learning, and deep learning have enabled radiologists to pinpoint brain tumors more effectively and reliably.</div></div><div><h3>Objective</h3><div>Traditional machine learning techniques used in addressing this issue necessitate manually crafted features for classification purposes. Conversely, deep learning methodologies can be formulated to circumvent the need for manual feature extraction while achieving precise classification outcomes. Accordingly, we decided to propose a deep learning based model for automatic classification of brain tumors from MRI images.</div></div><div><h3>Method</h3><div>Two different deep learning based models were designed to detect both binary (abnormal and normal) and multiclass (glioma, meningioma, and pituitary) brain tumors. Figshare, Br35H, and Harvard Medical datasets comprising 3064, 3000, and 152 MRI images were used to train the proposed models. Initially, a deep Convolutional Neural Network (CNN) including 26 layers was applied to the Figshare dataset due to its extensive MRI image count for training purposes. While the proposed ‘Deep CNN’ architecture encountered issues of overfitting, transfer learning was utilized by individually combining fine-tuned VGG16 and Xception architectures with an adaptation of the ‘Deep CNN’ model on Br35H and Harvard Medical datasets.</div></div><div><h3>Results</h3><div>Experimental results indicated that the proposed Deep CNN achieved a classification accuracy of 97.27% on the Figshare dataset. Accuracies of 97.14% and 98.57% were respectively obtained using fine-tuned VGG16 and Xception on the Br35H dataset.100% accuracy was also obtained on the Harvard Medical dataset using both fine-tuned models.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105555"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Underwater image restoration using Joint Local–Global Polarization Complementary Network 基于局部-全局联合偏振互补网络的水下图像恢复

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-17 DOI: 10.1016/j.imavis.2025.105546

Rui Ruan , Weidong Zhang , Zheng Liang

{"title":"Underwater image restoration using Joint Local–Global Polarization Complementary Network","authors":"Rui Ruan , Weidong Zhang , Zheng Liang","doi":"10.1016/j.imavis.2025.105546","DOIUrl":"10.1016/j.imavis.2025.105546","url":null,"abstract":"<div><div>Underwater image always suffers from the degradation of visual quality and lack of clear details caused by light scattering effect. Since polarization imaging can effectively eliminate the backscattering light, polarization-based methods become more attractive to restore the image, which utilize the difference of polarization characteristics to boost the restoration performance. In this paper, we propose an underwater image restoration using joint Local–Global Polarization Complementary Network, named LGPCNet, to achieve a clear underwater image from multi-polarization images. In particular, we design a local polarization complement module (LCM) to adaptively fuse complementary information of local regions from images with different polarization states. By incorporating this, we can restore rich details including color and texture from other polarimetric images. Then, to balance visual effects between images with different polarization states, we propose a global appearance sharing module (GSM) to obtain the consistent brightness across different polarization images. Finally, we adaptively aggregate the restored information from each polarization states to obtain a final clear image. Experiments on an extended natural underwater polarization image dataset demonstrate that our proposed method achieves superior image restoration performance in terms of color, brightness and contrast compared with state-of-the-art image restored methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105546"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143877305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GCESS: A two-phase generative learning framework for estimate molecular expression to cell detection and analysis gess：一个用于估计分子表达到细胞检测和分析的两阶段生成式学习框架

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-15 DOI: 10.1016/j.imavis.2025.105554

Tianwang Xun , Lei Su , Wenting Shang , Di Dong , Lizhi Shao

{"title":"GCESS: A two-phase generative learning framework for estimate molecular expression to cell detection and analysis","authors":"Tianwang Xun , Lei Su , Wenting Shang , Di Dong , Lizhi Shao","doi":"10.1016/j.imavis.2025.105554","DOIUrl":"10.1016/j.imavis.2025.105554","url":null,"abstract":"<div><div>Whole slide image (WSI) plays an important role in cancer research. Cell recognition is the foundation and key steps of WSI analysis at the cellular level, including cell segmentation, subtypes detection and molecular expression prediction at the cellular level. Current end-to-end supervised learning models rely heavily on a large amount of manually labeled data and self-supervised learning models are limited to cell binary segmentation. All of these methods lack the ability to predict the expression level of molecules in single cells. In this study, we proposed a two-phase generative adversarial learning framework, named GCESS, which can achieve end-to-end cell binary segmentation, subtypes detection and molecular expression prediction simultaneously. The framework uses generative adversarial learning to obtain better cell binary segmentation results in the first phase by integrating the cell binary segmentation results of some segmentation models and generates multiplex immunohistochemistry (mIHC) images through generative adversarial networks to predict the expression of cell molecules in the second phase. The cell semantic segmentation results can be obtained by spatially mapping the binary segmentation and molecular expression results in pixel level. The method we proposed achieves a Dice of 0.865 on cell binary segmentation, an accuracy of 0.917 on cell semantic segmentation and a Peak Signal to Noise Ratio (PSNR) of 20.929 dB on mIHC images generating, outperforming other competing methods (P-value <0.05). The method we proposed will provide an effective tool for cellular level analysis of digital pathology images and cancer research.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105554"},"PeriodicalIF":4.2,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bridging efficiency and interpretability: Explainable AI for multi-classification of pulmonary diseases utilizing modified lightweight CNNs 桥接效率和可解释性：利用改进的轻量级cnn进行肺部疾病多分类的可解释人工智能

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-15 DOI: 10.1016/j.imavis.2025.105553

Samia Khan, Farheen Siddiqui, Mohd Abdul Ahad

{"title":"Bridging efficiency and interpretability: Explainable AI for multi-classification of pulmonary diseases utilizing modified lightweight CNNs","authors":"Samia Khan, Farheen Siddiqui, Mohd Abdul Ahad","doi":"10.1016/j.imavis.2025.105553","DOIUrl":"10.1016/j.imavis.2025.105553","url":null,"abstract":"<div><div>Pulmonary diseases are notable global health challenges that contribute to increased morbidity and mortality rates. Early and accurate diagnosis is essential for effective treatment. However, traditional apprehension of chest X-ray images is tiresome and susceptible to human error, particularly in resource-constrained settings. Current progress in deep learning, particularly convolutional neural networks, has enabled the automated classification of pulmonary diseases with increased accuracy. In this study, we have proposed an explainable AI approach using modified lightweight convolution neural networks, such as MobileNetV2, EfficientNet-B0, NASNetMobile, and ResNet50V2 to achieve efficient and interpretable classification of multiple pulmonary diseases. Lightweight CNNs are designed to minimize computational complexity while maintaining robust performance, making them ideal for mobile and embedded systems with limited processing power deployment. Our models demonstrated strong performance in detecting pulmonary diseases, with EfficientNet-B0 achieving an accuracy of 94.07%, precision of 94.16%, recall of 94.07%, and F1 score of 94.04%. Furthermore, we have incorporated explainability methods (grad-CAM & t-SNE) to enhance the transparency of model predictions, providing clinicians with a trustworthy tool for diagnostic decision support. The results suggest that lightweight CNNs effectively balance accuracy, efficiency, and interpretability, making them suitable for real-time pulmonary disease detection in clinical and low-resource environments</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105553"},"PeriodicalIF":4.2,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic feature extraction and histopathology domain shift alignment for mitosis detection 有丝分裂检测的动态特征提取和组织病理学域移位对齐

IF 4.2 3区计算机科学

Image and Vision Computing Pub Date : 2025-04-14 DOI: 10.1016/j.imavis.2025.105541

Jiangxiao Han, Shikang Wang, Lianjun Wu, Wenyu Liu

{"title":"Dynamic feature extraction and histopathology domain shift alignment for mitosis detection","authors":"Jiangxiao Han, Shikang Wang, Lianjun Wu, Wenyu Liu","doi":"10.1016/j.imavis.2025.105541","DOIUrl":"10.1016/j.imavis.2025.105541","url":null,"abstract":"<div><div>Mitosis count is of crucial significance in cancer diagnosis; therefore, mitosis detection is a meaningful subject in medical image studies. The challenge of mitosis detection lies in the intra-class variance of mitosis and hard negatives, i.e., the sizes/ shapes of mitotic cells vary considerably and plenty of non-mitotic cells resemble mitosis, and the histopathology domain shift across datasets caused by different tissues and organs, scanners, labs, etc. In this paper, we propose a novel Domain Generalized Dynamic Mitosis Detector (DGDMD) to handle the intra-class variance and histopathology domain shift of mitosis detection with a dynamic mitosis feature extractor based on residual structured depth-wise convolution and domain shift alignment terms. The proposed dynamic mitosis feature extractor handles the intra-class variance caused by different sizes and shapes of mitotic cells as well as non-mitotic hard negatives. The proposed domain generalization schedule implemented via novel histopathology-mitosis domain shift alignments deals with the domain shift between histopathology slides in training and test datasets from different sources. We validate the domain generalization ability for mitosis detection of our algorithm on the MIDOG++ dataset and typical mitosis datasets, including the MIDOG 2021, ICPR MITOSIS 2014, AMIDA 2013, and TUPAC 16. Experimental results show that we achieve state-of-the-art (SOTA) performance on the MIDOG++ dataset for the domain generalization across tissue and organs of mitosis detection, across scanners on the MIDOG 2021 dataset, and across data sources on external datasets, demonstrating the effectiveness of our proposed method on the domain generalization of mitosis detection.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105541"},"PeriodicalIF":4.2,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0