Bo Chen , Youhao Huang , Yufan Liu , Dong Sui , Fei Yang
{"title":"SRMA-KD: Structured relational multi-scale attention knowledge distillation for effective lightweight cardiac image segmentation","authors":"Bo Chen , Youhao Huang , Yufan Liu , Dong Sui , Fei Yang","doi":"10.1016/j.imavis.2025.105577","DOIUrl":"10.1016/j.imavis.2025.105577","url":null,"abstract":"<div><div>Cardiac image segmentation is essential for accurately extracting structural information of the heart, aiding in precise diagnosis and personalized treatment planning. However, real-time segmentation on medical devices demands computational efficiency that often conflicts with the intensive processing and storage requirements of deep learning algorithms. These algorithms are frequently hindered by their complex models and extensive parameter sets, which limit their feasibility in clinical settings with constrained resources. Meanwhile, the performance of lightweight heart segmentation models still requires enhancement. This study introduces the SRMA-KD framework, a knowledge distillation approach for cardiac image segmentation designed to achieve high accuracy with lightweight models. The framework efficiently transfers semantic feature information and structural knowledge from a teacher model to student model, ensuring effective segmentation within clinical resource limitations. The SRMA-KD framework includes three key modules: the Global Structural Relational Block (GSRB), the Multi-scale Feature Attention Block (MFAB), and the Prediction Difference Transfer Block (PDTB). The GSRB correlates the outputs of the teacher and student networks with the ground truth, transferring structural correlations to enhance the student network's global feature learning. The MFAB enables the student network to learn multi-scale feature extraction from the teacher network, focusing on relevant semantic regions. The PDTB minimizes pixel-level differences between the segmentation images of the teacher and student networks. Our experiments demonstrate that the SRMA-KD framework significantly improves the segmentation accuracy of the student network compared to other medical imaging knowledge distillation methods, highlighting its potential as an effective solution for cardiac image segmentation in resource-limited clinical environments.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105577"},"PeriodicalIF":4.2,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143908168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang
{"title":"RC-SODet: Reparameterized dual convolutions and compact feature enhancement for small object detector","authors":"Ze Wu , Zhongxu Li , Huan Lei , Hong Zhao , Wenyuan Yang","doi":"10.1016/j.imavis.2025.105552","DOIUrl":"10.1016/j.imavis.2025.105552","url":null,"abstract":"<div><div>In the field of object detection, small object detection tasks have broad application prospects. However, detection models often face issues with insufficient image features for small objects and limited computational resources. To address these issues, we propose RC-SODet, a small object detector that uses reparameterization techniques combined with dual convolutions and compact feature enhancement blocks. In the detector, we design Reparameterized Dual Convolutions (RepDuConv) to replace conventional convolution and downsampling blocks. Its dual-branch advantage maintains accuracy, and the reparameterization technique built on this significantly improves inference efficiency. Compact Feature-enhanced Pyramid Network (RC-FPN) serves as the neck, using reparameterizable Cross Stage Partial with Feature Fusion Reparameterized Compact Blocks (C2fRCB) for feature enhancement. First, in the backbone network, RepDuConv replaces convolution blocks to perform downsampling on input images, thereby obtaining multi-scale features to pass to the neck. Second, the model uses RC-FPN as the feature pyramid neck to process multi-scale features from the backbone. After each front-end upsampling and fusion, dual-layer C2fRCB is applied to further refine and enhance the tensor features at different fusion scales. Finally, multi-level feature maps are fused at the back-end and passed to the detection head. Additionally, in the inference stage, both RepDuConv and C2fRCB optimize branch structures through reparameterization techniques. Experimental results show that on the small object datasets VisDrone and DroneVehicle, the highest version of RC-SODet achieves 48.1% and 82.4% mAP50, as well as 30.1% and 59.1% mAP50-95, respectively. The designed reparameterization technique increases the model inference speed by 58.1%.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105552"},"PeriodicalIF":4.2,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrigendum to “A novel framework for diverse video generation from a single video using frame-conditioned denoising diffusion probabilistic model and ConvNeXt-V2” [Image and Vision Computing 154 (2025) 105422]","authors":"Ayushi Verma, Tapas Badal, Abhay Bansal","doi":"10.1016/j.imavis.2025.105556","DOIUrl":"10.1016/j.imavis.2025.105556","url":null,"abstract":"","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105556"},"PeriodicalIF":4.2,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144105987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rethinking the sample relations for few-shot classification","authors":"Guowei Yin , Sheng Huang , Luwen Huangfu , Yi Zhang , Xiaohong Zhang","doi":"10.1016/j.imavis.2025.105550","DOIUrl":"10.1016/j.imavis.2025.105550","url":null,"abstract":"<div><div>Feature quality is paramount for classification performance, particularly in few-shot scenarios. Contrastive learning, a widely adopted technique for enhancing feature quality, leverages sample relations to extract intrinsic features that capture semantic information and has achieved remarkable success in Few-Shot Learning (FSL). Nevertheless, current few-shot contrastive learning approaches often overlook the semantic similarity discrepancies at different granularities when employing the same modeling approach for different sample relations, which limits the potential of few-shot contrastive learning. In this paper, we introduce a straightforward yet effective contrastive learning approach, Multi-Grained Relation Contrastive Learning (MGRCL), as a pre-training feature learning model to boost few-shot learning by meticulously modeling sample relations at different granularities. MGRCL categorizes sample relations into three types: intra-sample relation of the same sample under different transformations, intra-class relation of homogeneous samples, and inter-class relation of inhomogeneous samples. In MGRCL, we design Transformation Consistency Learning (TCL) to ensure the rigorous semantic consistency of a sample under different transformations by aligning predictions of input pairs. Furthermore, to preserve discriminative information, we employ Class Contrastive Learning (CCL) to ensure that a sample is always closer to its homogeneous samples than its inhomogeneous ones, as homogeneous samples share similar semantic content while inhomogeneous samples have different semantic content. Our method is assessed across four popular FSL benchmarks, showing that such a simple pre-training feature learning method surpasses a majority of leading FSL methods. Moreover, our method can be incorporated into other FSL methods as the pre-trained model and help them obtain significant performance gains.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105550"},"PeriodicalIF":4.2,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exemplar-free class incremental action recognition based on self-supervised learning","authors":"Chunyu Hou, Yonghong Hou, Jinyin Jiang, Gunel Abdullayeva","doi":"10.1016/j.imavis.2025.105544","DOIUrl":"10.1016/j.imavis.2025.105544","url":null,"abstract":"<div><div>Class incremental action recognition faces the persistent challenge of balancing stability and plasticity, as models must learn new classes without forgetting previously acquired knowledge. Existing methods often rely on storing original samples, which significantly increases storage demands and risks overfitting to past data. To address these issues, an exemplar-free framework based on self-supervised learning and Pseudo-Feature Generation (PFG) mechanism is proposed. At each incremental step, PFG generates pseudo features for previously learned classes by using the mean and variance for each class. This framework enables effective joint training on new class data while keeping the feature extractor frozen, eliminating the need to store original data. It preserves past knowledge and dynamically adapts to new categories, striking a balance between stability and plasticity. Experiments on four extensively used datasets: UCF101, HMDB51, Kinetics, and SSV2 validate the effectiveness of the proposed framework.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105544"},"PeriodicalIF":4.2,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143873548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"W-Net: A facial feature-guided face super-resolution network","authors":"Hao Liu , Yang Yang , Yunxia Liu","doi":"10.1016/j.imavis.2025.105549","DOIUrl":"10.1016/j.imavis.2025.105549","url":null,"abstract":"<div><div>Face Super-Resolution (FSR) aims to recover high-resolution (HR) face images from low-resolution (LR) ones. Despite the progress made by convolutional neural networks in FSR, the results of existing approaches are not ideal due to their low reconstruction efficiency and insufficient utilization of prior information. Considering that faces are highly structured objects, effectively leveraging facial priors to improve FSR results is a worthwhile endeavor. This paper proposes a novel network architecture called W-Net to address this challenge. W-Net leverages a meticulously designed Parsing Block to fully exploit the resolution potential of LR image. We use this parsing map as an attention prior, effectively integrating information from both the parsing map and LR images. Simultaneously, we perform multiple fusions across different latent representation dimensions through the W-shaped network structure combined with the LPF(<strong>L</strong>R-<strong>P</strong>arsing Map <strong>F</strong>usion Module). Additionally, we utilize a facial parsing graph as a mask, assigning different weights and loss functions to key facial areas to balance the performance of our reconstructed facial images between perceptual quality and pixel accuracy. We conducted extensive comparative experiments, not only limited to conventional facial super-resolution metrics but also extending to downstream tasks such as facial recognition and facial keypoint detection. The experiments demonstrate that W-Net exhibits outstanding performance in quantitative metrics, visual quality, and downstream tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105549"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing brain tumor classification in MRI images: A deep learning-based approach for accurate diagnosis","authors":"Hossein Sadr , Mojdeh Nazari , Shahrokh Yousefzadeh-Chabok , Hassan Emami , Reza Rabiei , Ali Ashraf","doi":"10.1016/j.imavis.2025.105555","DOIUrl":"10.1016/j.imavis.2025.105555","url":null,"abstract":"<div><h3>Background</h3><div>Detecting brain tumors from MRI images is crucial for early intervention, accurate diagnosis, and effective treatment planning. MRI imaging offers detailed information about the location, size, and characteristics of brain tumors which enables healthcare professionals to make decisions considering treatment options such as surgery, radiation therapy, and chemotherapy. However, this process is time-consuming and demands specialized expertise to manually assess MRI images. Presently, advancements in Computer-Aided Diagnosis (CAD), machine learning, and deep learning have enabled radiologists to pinpoint brain tumors more effectively and reliably.</div></div><div><h3>Objective</h3><div>Traditional machine learning techniques used in addressing this issue necessitate manually crafted features for classification purposes. Conversely, deep learning methodologies can be formulated to circumvent the need for manual feature extraction while achieving precise classification outcomes. Accordingly, we decided to propose a deep learning based model for automatic classification of brain tumors from MRI images.</div></div><div><h3>Method</h3><div>Two different deep learning based models were designed to detect both binary (abnormal and normal) and multiclass (glioma, meningioma, and pituitary) brain tumors. Figshare, Br35H, and Harvard Medical datasets comprising 3064, 3000, and 152 MRI images were used to train the proposed models. Initially, a deep Convolutional Neural Network (CNN) including 26 layers was applied to the Figshare dataset due to its extensive MRI image count for training purposes. While the proposed ‘Deep CNN’ architecture encountered issues of overfitting, transfer learning was utilized by individually combining fine-tuned VGG16 and Xception architectures with an adaptation of the ‘Deep CNN’ model on Br35H and Harvard Medical datasets.</div></div><div><h3>Results</h3><div>Experimental results indicated that the proposed Deep CNN achieved a classification accuracy of 97.27% on the Figshare dataset. Accuracies of 97.14% and 98.57% were respectively obtained using fine-tuned VGG16 and Xception on the Br35H dataset.100% accuracy was also obtained on the Harvard Medical dataset using both fine-tuned models.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105555"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Underwater image restoration using Joint Local–Global Polarization Complementary Network","authors":"Rui Ruan , Weidong Zhang , Zheng Liang","doi":"10.1016/j.imavis.2025.105546","DOIUrl":"10.1016/j.imavis.2025.105546","url":null,"abstract":"<div><div>Underwater image always suffers from the degradation of visual quality and lack of clear details caused by light scattering effect. Since polarization imaging can effectively eliminate the backscattering light, polarization-based methods become more attractive to restore the image, which utilize the difference of polarization characteristics to boost the restoration performance. In this paper, we propose an underwater image restoration using joint Local–Global Polarization Complementary Network, named LGPCNet, to achieve a clear underwater image from multi-polarization images. In particular, we design a local polarization complement module (LCM) to adaptively fuse complementary information of local regions from images with different polarization states. By incorporating this, we can restore rich details including color and texture from other polarimetric images. Then, to balance visual effects between images with different polarization states, we propose a global appearance sharing module (GSM) to obtain the consistent brightness across different polarization images. Finally, we adaptively aggregate the restored information from each polarization states to obtain a final clear image. Experiments on an extended natural underwater polarization image dataset demonstrate that our proposed method achieves superior image restoration performance in terms of color, brightness and contrast compared with state-of-the-art image restored methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105546"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143877305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianwang Xun , Lei Su , Wenting Shang , Di Dong , Lizhi Shao
{"title":"GCESS: A two-phase generative learning framework for estimate molecular expression to cell detection and analysis","authors":"Tianwang Xun , Lei Su , Wenting Shang , Di Dong , Lizhi Shao","doi":"10.1016/j.imavis.2025.105554","DOIUrl":"10.1016/j.imavis.2025.105554","url":null,"abstract":"<div><div>Whole slide image (WSI) plays an important role in cancer research. Cell recognition is the foundation and key steps of WSI analysis at the cellular level, including cell segmentation, subtypes detection and molecular expression prediction at the cellular level. Current end-to-end supervised learning models rely heavily on a large amount of manually labeled data and self-supervised learning models are limited to cell binary segmentation. All of these methods lack the ability to predict the expression level of molecules in single cells. In this study, we proposed a two-phase generative adversarial learning framework, named GCESS, which can achieve end-to-end cell binary segmentation, subtypes detection and molecular expression prediction simultaneously. The framework uses generative adversarial learning to obtain better cell binary segmentation results in the first phase by integrating the cell binary segmentation results of some segmentation models and generates multiplex immunohistochemistry (mIHC) images through generative adversarial networks to predict the expression of cell molecules in the second phase. The cell semantic segmentation results can be obtained by spatially mapping the binary segmentation and molecular expression results in pixel level. The method we proposed achieves a Dice of 0.865 on cell binary segmentation, an accuracy of 0.917 on cell semantic segmentation and a Peak Signal to Noise Ratio (PSNR) of 20.929 dB on mIHC images generating, outperforming other competing methods (P-value <<!--> <!-->0.05). The method we proposed will provide an effective tool for cellular level analysis of digital pathology images and cancer research.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105554"},"PeriodicalIF":4.2,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging efficiency and interpretability: Explainable AI for multi-classification of pulmonary diseases utilizing modified lightweight CNNs","authors":"Samia Khan, Farheen Siddiqui, Mohd Abdul Ahad","doi":"10.1016/j.imavis.2025.105553","DOIUrl":"10.1016/j.imavis.2025.105553","url":null,"abstract":"<div><div>Pulmonary diseases are notable global health challenges that contribute to increased morbidity and mortality rates. Early and accurate diagnosis is essential for effective treatment. However, traditional apprehension of chest X-ray images is tiresome and susceptible to human error, particularly in resource-constrained settings. Current progress in deep learning, particularly convolutional neural networks, has enabled the automated classification of pulmonary diseases with increased accuracy. In this study, we have proposed an explainable AI approach using modified lightweight convolution neural networks, such as MobileNetV2, EfficientNet-B0, NASNetMobile, and ResNet50V2 to achieve efficient and interpretable classification of multiple pulmonary diseases. Lightweight CNNs are designed to minimize computational complexity while maintaining robust performance, making them ideal for mobile and embedded systems with limited processing power deployment. Our models demonstrated strong performance in detecting pulmonary diseases, with EfficientNet-B0 achieving an accuracy of 94.07%, precision of 94.16%, recall of 94.07%, and F1 score of 94.04%. Furthermore, we have incorporated explainability methods (grad-CAM & t-SNE) to enhance the transparency of model predictions, providing clinicians with a trustworthy tool for diagnostic decision support. The results suggest that lightweight CNNs effectively balance accuracy, efficiency, and interpretability, making them suitable for real-time pulmonary disease detection in clinical and low-resource environments</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105553"},"PeriodicalIF":4.2,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}