{"title":"Integrating end-to-end multimodal deep learning and domain adaptation for robust facial expression recognition","authors":"Mahmoud Hassaballah , Chiara Pero , Ranjeet Kumar Rout , Saiyed Umer","doi":"10.1016/j.imavis.2025.105548","DOIUrl":"10.1016/j.imavis.2025.105548","url":null,"abstract":"<div><div>This paper presents an advanced approach to a facial expression recognition (FER) system designed for robust performance across diverse imaging environments. The proposed method consists of four primary components: image preprocessing, feature representation and classification, cross-domain feature analysis, and domain adaptation. The process begins with facial region extraction from input images, including those captured in unconstrained imaging conditions, where variations in lighting, background, and image quality significantly impact recognition performance. The extracted facial region undergoes feature extraction using an ensemble of multimodal deep learning techniques, including end-to-end CNNs, BilinearCNN, TrilinearCNN, and pretrained CNN models, which capture both local and global facial features with high precision. The ensemble approach enriches feature representation by integrating information from multiple models, enhancing the system’s ability to generalize across different subjects and expressions. These deep features are then passed to a classifier trained to recognize facial expressions effectively in real-time scenarios. Since images captured in real-world conditions often contain noise and artifacts that can compromise accuracy, cross-domain analysis is performed to evaluate the discriminative power and robustness of the extracted deep features. FER systems typically experience performance degradation when applied to domains that differ from the original training environment. To mitigate this issue, domain adaptation techniques are incorporated, enabling the system to effectively adjust to new imaging conditions and improving recognition accuracy even in challenging real-time acquisition environments. The proposed FER system is validated using four well-established benchmark datasets: CK+, KDEF, IMFDB and AffectNet. Experimental results demonstrate that the proposed system achieves high performance within original domains and exhibits superior cross-domain recognition compared to existing state-of-the-art methods. These findings indicate that the system is highly reliable for applications requiring robust and adaptive FER capabilities across varying imaging conditions and domains.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105548"},"PeriodicalIF":4.2,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143899659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rethinking the sample relations for few-shot classification","authors":"Guowei Yin , Sheng Huang , Luwen Huangfu , Yi Zhang , Xiaohong Zhang","doi":"10.1016/j.imavis.2025.105550","DOIUrl":"10.1016/j.imavis.2025.105550","url":null,"abstract":"<div><div>Feature quality is paramount for classification performance, particularly in few-shot scenarios. Contrastive learning, a widely adopted technique for enhancing feature quality, leverages sample relations to extract intrinsic features that capture semantic information and has achieved remarkable success in Few-Shot Learning (FSL). Nevertheless, current few-shot contrastive learning approaches often overlook the semantic similarity discrepancies at different granularities when employing the same modeling approach for different sample relations, which limits the potential of few-shot contrastive learning. In this paper, we introduce a straightforward yet effective contrastive learning approach, Multi-Grained Relation Contrastive Learning (MGRCL), as a pre-training feature learning model to boost few-shot learning by meticulously modeling sample relations at different granularities. MGRCL categorizes sample relations into three types: intra-sample relation of the same sample under different transformations, intra-class relation of homogeneous samples, and inter-class relation of inhomogeneous samples. In MGRCL, we design Transformation Consistency Learning (TCL) to ensure the rigorous semantic consistency of a sample under different transformations by aligning predictions of input pairs. Furthermore, to preserve discriminative information, we employ Class Contrastive Learning (CCL) to ensure that a sample is always closer to its homogeneous samples than its inhomogeneous ones, as homogeneous samples share similar semantic content while inhomogeneous samples have different semantic content. Our method is assessed across four popular FSL benchmarks, showing that such a simple pre-training feature learning method surpasses a majority of leading FSL methods. Moreover, our method can be incorporated into other FSL methods as the pre-trained model and help them obtain significant performance gains.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105550"},"PeriodicalIF":4.2,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exemplar-free class incremental action recognition based on self-supervised learning","authors":"Chunyu Hou, Yonghong Hou, Jinyin Jiang, Gunel Abdullayeva","doi":"10.1016/j.imavis.2025.105544","DOIUrl":"10.1016/j.imavis.2025.105544","url":null,"abstract":"<div><div>Class incremental action recognition faces the persistent challenge of balancing stability and plasticity, as models must learn new classes without forgetting previously acquired knowledge. Existing methods often rely on storing original samples, which significantly increases storage demands and risks overfitting to past data. To address these issues, an exemplar-free framework based on self-supervised learning and Pseudo-Feature Generation (PFG) mechanism is proposed. At each incremental step, PFG generates pseudo features for previously learned classes by using the mean and variance for each class. This framework enables effective joint training on new class data while keeping the feature extractor frozen, eliminating the need to store original data. It preserves past knowledge and dynamically adapts to new categories, striking a balance between stability and plasticity. Experiments on four extensively used datasets: UCF101, HMDB51, Kinetics, and SSV2 validate the effectiveness of the proposed framework.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105544"},"PeriodicalIF":4.2,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143873548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"W-Net: A facial feature-guided face super-resolution network","authors":"Hao Liu , Yang Yang , Yunxia Liu","doi":"10.1016/j.imavis.2025.105549","DOIUrl":"10.1016/j.imavis.2025.105549","url":null,"abstract":"<div><div>Face Super-Resolution (FSR) aims to recover high-resolution (HR) face images from low-resolution (LR) ones. Despite the progress made by convolutional neural networks in FSR, the results of existing approaches are not ideal due to their low reconstruction efficiency and insufficient utilization of prior information. Considering that faces are highly structured objects, effectively leveraging facial priors to improve FSR results is a worthwhile endeavor. This paper proposes a novel network architecture called W-Net to address this challenge. W-Net leverages a meticulously designed Parsing Block to fully exploit the resolution potential of LR image. We use this parsing map as an attention prior, effectively integrating information from both the parsing map and LR images. Simultaneously, we perform multiple fusions across different latent representation dimensions through the W-shaped network structure combined with the LPF(<strong>L</strong>R-<strong>P</strong>arsing Map <strong>F</strong>usion Module). Additionally, we utilize a facial parsing graph as a mask, assigning different weights and loss functions to key facial areas to balance the performance of our reconstructed facial images between perceptual quality and pixel accuracy. We conducted extensive comparative experiments, not only limited to conventional facial super-resolution metrics but also extending to downstream tasks such as facial recognition and facial keypoint detection. The experiments demonstrate that W-Net exhibits outstanding performance in quantitative metrics, visual quality, and downstream tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105549"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing brain tumor classification in MRI images: A deep learning-based approach for accurate diagnosis","authors":"Hossein Sadr , Mojdeh Nazari , Shahrokh Yousefzadeh-Chabok , Hassan Emami , Reza Rabiei , Ali Ashraf","doi":"10.1016/j.imavis.2025.105555","DOIUrl":"10.1016/j.imavis.2025.105555","url":null,"abstract":"<div><h3>Background</h3><div>Detecting brain tumors from MRI images is crucial for early intervention, accurate diagnosis, and effective treatment planning. MRI imaging offers detailed information about the location, size, and characteristics of brain tumors which enables healthcare professionals to make decisions considering treatment options such as surgery, radiation therapy, and chemotherapy. However, this process is time-consuming and demands specialized expertise to manually assess MRI images. Presently, advancements in Computer-Aided Diagnosis (CAD), machine learning, and deep learning have enabled radiologists to pinpoint brain tumors more effectively and reliably.</div></div><div><h3>Objective</h3><div>Traditional machine learning techniques used in addressing this issue necessitate manually crafted features for classification purposes. Conversely, deep learning methodologies can be formulated to circumvent the need for manual feature extraction while achieving precise classification outcomes. Accordingly, we decided to propose a deep learning based model for automatic classification of brain tumors from MRI images.</div></div><div><h3>Method</h3><div>Two different deep learning based models were designed to detect both binary (abnormal and normal) and multiclass (glioma, meningioma, and pituitary) brain tumors. Figshare, Br35H, and Harvard Medical datasets comprising 3064, 3000, and 152 MRI images were used to train the proposed models. Initially, a deep Convolutional Neural Network (CNN) including 26 layers was applied to the Figshare dataset due to its extensive MRI image count for training purposes. While the proposed ‘Deep CNN’ architecture encountered issues of overfitting, transfer learning was utilized by individually combining fine-tuned VGG16 and Xception architectures with an adaptation of the ‘Deep CNN’ model on Br35H and Harvard Medical datasets.</div></div><div><h3>Results</h3><div>Experimental results indicated that the proposed Deep CNN achieved a classification accuracy of 97.27% on the Figshare dataset. Accuracies of 97.14% and 98.57% were respectively obtained using fine-tuned VGG16 and Xception on the Br35H dataset.100% accuracy was also obtained on the Harvard Medical dataset using both fine-tuned models.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105555"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143881991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Underwater image restoration using Joint Local–Global Polarization Complementary Network","authors":"Rui Ruan , Weidong Zhang , Zheng Liang","doi":"10.1016/j.imavis.2025.105546","DOIUrl":"10.1016/j.imavis.2025.105546","url":null,"abstract":"<div><div>Underwater image always suffers from the degradation of visual quality and lack of clear details caused by light scattering effect. Since polarization imaging can effectively eliminate the backscattering light, polarization-based methods become more attractive to restore the image, which utilize the difference of polarization characteristics to boost the restoration performance. In this paper, we propose an underwater image restoration using joint Local–Global Polarization Complementary Network, named LGPCNet, to achieve a clear underwater image from multi-polarization images. In particular, we design a local polarization complement module (LCM) to adaptively fuse complementary information of local regions from images with different polarization states. By incorporating this, we can restore rich details including color and texture from other polarimetric images. Then, to balance visual effects between images with different polarization states, we propose a global appearance sharing module (GSM) to obtain the consistent brightness across different polarization images. Finally, we adaptively aggregate the restored information from each polarization states to obtain a final clear image. Experiments on an extended natural underwater polarization image dataset demonstrate that our proposed method achieves superior image restoration performance in terms of color, brightness and contrast compared with state-of-the-art image restored methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105546"},"PeriodicalIF":4.2,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143877305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianwang Xun , Lei Su , Wenting Shang , Di Dong , Lizhi Shao
{"title":"GCESS: A two-phase generative learning framework for estimate molecular expression to cell detection and analysis","authors":"Tianwang Xun , Lei Su , Wenting Shang , Di Dong , Lizhi Shao","doi":"10.1016/j.imavis.2025.105554","DOIUrl":"10.1016/j.imavis.2025.105554","url":null,"abstract":"<div><div>Whole slide image (WSI) plays an important role in cancer research. Cell recognition is the foundation and key steps of WSI analysis at the cellular level, including cell segmentation, subtypes detection and molecular expression prediction at the cellular level. Current end-to-end supervised learning models rely heavily on a large amount of manually labeled data and self-supervised learning models are limited to cell binary segmentation. All of these methods lack the ability to predict the expression level of molecules in single cells. In this study, we proposed a two-phase generative adversarial learning framework, named GCESS, which can achieve end-to-end cell binary segmentation, subtypes detection and molecular expression prediction simultaneously. The framework uses generative adversarial learning to obtain better cell binary segmentation results in the first phase by integrating the cell binary segmentation results of some segmentation models and generates multiplex immunohistochemistry (mIHC) images through generative adversarial networks to predict the expression of cell molecules in the second phase. The cell semantic segmentation results can be obtained by spatially mapping the binary segmentation and molecular expression results in pixel level. The method we proposed achieves a Dice of 0.865 on cell binary segmentation, an accuracy of 0.917 on cell semantic segmentation and a Peak Signal to Noise Ratio (PSNR) of 20.929 dB on mIHC images generating, outperforming other competing methods (P-value <<!--> <!-->0.05). The method we proposed will provide an effective tool for cellular level analysis of digital pathology images and cancer research.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105554"},"PeriodicalIF":4.2,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging efficiency and interpretability: Explainable AI for multi-classification of pulmonary diseases utilizing modified lightweight CNNs","authors":"Samia Khan, Farheen Siddiqui, Mohd Abdul Ahad","doi":"10.1016/j.imavis.2025.105553","DOIUrl":"10.1016/j.imavis.2025.105553","url":null,"abstract":"<div><div>Pulmonary diseases are notable global health challenges that contribute to increased morbidity and mortality rates. Early and accurate diagnosis is essential for effective treatment. However, traditional apprehension of chest X-ray images is tiresome and susceptible to human error, particularly in resource-constrained settings. Current progress in deep learning, particularly convolutional neural networks, has enabled the automated classification of pulmonary diseases with increased accuracy. In this study, we have proposed an explainable AI approach using modified lightweight convolution neural networks, such as MobileNetV2, EfficientNet-B0, NASNetMobile, and ResNet50V2 to achieve efficient and interpretable classification of multiple pulmonary diseases. Lightweight CNNs are designed to minimize computational complexity while maintaining robust performance, making them ideal for mobile and embedded systems with limited processing power deployment. Our models demonstrated strong performance in detecting pulmonary diseases, with EfficientNet-B0 achieving an accuracy of 94.07%, precision of 94.16%, recall of 94.07%, and F1 score of 94.04%. Furthermore, we have incorporated explainability methods (grad-CAM & t-SNE) to enhance the transparency of model predictions, providing clinicians with a trustworthy tool for diagnostic decision support. The results suggest that lightweight CNNs effectively balance accuracy, efficiency, and interpretability, making them suitable for real-time pulmonary disease detection in clinical and low-resource environments</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105553"},"PeriodicalIF":4.2,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangxiao Han, Shikang Wang, Lianjun Wu, Wenyu Liu
{"title":"Dynamic feature extraction and histopathology domain shift alignment for mitosis detection","authors":"Jiangxiao Han, Shikang Wang, Lianjun Wu, Wenyu Liu","doi":"10.1016/j.imavis.2025.105541","DOIUrl":"10.1016/j.imavis.2025.105541","url":null,"abstract":"<div><div>Mitosis count is of crucial significance in cancer diagnosis; therefore, mitosis detection is a meaningful subject in medical image studies. The challenge of mitosis detection lies in the intra-class variance of mitosis and hard negatives, i.e., the sizes/ shapes of mitotic cells vary considerably and plenty of non-mitotic cells resemble mitosis, and the histopathology domain shift across datasets caused by different tissues and organs, scanners, labs, etc. In this paper, we propose a novel Domain Generalized Dynamic Mitosis Detector (DGDMD) to handle the intra-class variance and histopathology domain shift of mitosis detection with a dynamic mitosis feature extractor based on residual structured depth-wise convolution and domain shift alignment terms. The proposed dynamic mitosis feature extractor handles the intra-class variance caused by different sizes and shapes of mitotic cells as well as non-mitotic hard negatives. The proposed domain generalization schedule implemented via novel histopathology-mitosis domain shift alignments deals with the domain shift between histopathology slides in training and test datasets from different sources. We validate the domain generalization ability for mitosis detection of our algorithm on the MIDOG++ dataset and typical mitosis datasets, including the MIDOG 2021, ICPR MITOSIS 2014, AMIDA 2013, and TUPAC 16. Experimental results show that we achieve state-of-the-art (SOTA) performance on the MIDOG++ dataset for the domain generalization across tissue and organs of mitosis detection, across scanners on the MIDOG 2021 dataset, and across data sources on external datasets, demonstrating the effectiveness of our proposed method on the domain generalization of mitosis detection.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"158 ","pages":"Article 105541"},"PeriodicalIF":4.2,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danyal Badar Soomro , Wang ChengLiang , Mahmood Ashraf , Dina Abdulaziz AlHammadi , Shtwai Alsubai , Carlo Medaglia , Nisreen Innab , Muhammad Umer
{"title":"Automated dual CNN-based feature extraction with SMOTE for imbalanced diabetic retinopathy classification","authors":"Danyal Badar Soomro , Wang ChengLiang , Mahmood Ashraf , Dina Abdulaziz AlHammadi , Shtwai Alsubai , Carlo Medaglia , Nisreen Innab , Muhammad Umer","doi":"10.1016/j.imavis.2025.105537","DOIUrl":"10.1016/j.imavis.2025.105537","url":null,"abstract":"<div><div>The primary cause of Diabetic Retinopathy (DR) is high blood sugar due to long-term diabetes. Early and correct diagnosis of the DR is essential for timely and effective treatment. Despite high performance of recently developed models, there is still a need to overcome the problem of class imbalance issues and feature extraction to achieve accurate results. To resolve this problem, we have presented an automated model combining the customized ResNet-50 and EfficientNetB0 for detecting and classifying DR in fundus images. The proposed model addresses class imbalance using data augmentation and Synthetic Minority Oversampling Technique (SMOTE) for oversampling the training data and enhances the feature extraction process through fine-tuned ResNet50 and EfficientNetB0 models with ReLU activations and global average pooling. Combining extracted features and then passing it to four different classifiers effectively captures both local and global spatial features, thereby improving classification accuracy for diabetic retinopathy. For Experiment, The APTOS 2019 Dataset is used, and it contains of 3662 high-quality fundus images. The performance of the proposed model is assessed using several metrics, and the findings are compared with contemporary methods for diabetic retinopathy detection. The suggested methodology demonstrates substantial enhancement in diabetic retinopathy diagnosis for fundus pictures. The proposed automated model attained an accuracy of 98.5% for binary classification and 92.73% for multiclass classification.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105537"},"PeriodicalIF":4.2,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}