Zhenkai Qin, Qining Luo, Xunyi Nong, Xiaolong Chen, Hongfeng Zhang, Cora Un In Wong
{"title":"Self-Supervised Learning for Domain Generalization With a Multi-Classifier Ensemble Approach","authors":"Zhenkai Qin, Qining Luo, Xunyi Nong, Xiaolong Chen, Hongfeng Zhang, Cora Un In Wong","doi":"10.1049/ipr2.70098","DOIUrl":"https://doi.org/10.1049/ipr2.70098","url":null,"abstract":"<p>Domain generalization poses significant challenges, particularly as models must generalize effectively to unseen target domains after training on multiple source domains. Traditional approaches typically aim to minimize domain discrepancies; however, they often fall short when handling complex data variations and class imbalance. In this paper, we propose an innovative model, the self-supervised learning multi-classifier ensemble (SSL-MCE), to address these limitations. SSL-MCE integrates self-supervised learning within a dynamic multi-classifier ensemble framework, leveraging ResNet as a shared feature extraction backbone. By combining four distinct classifiers, it captures diverse and complementary features, thereby enhancing adaptability to new domains. A self-supervised rotation prediction task enables SSL-MCE to focus on intrinsic data structures rather than domain-specific details, learning robust domain-invariant features. To mitigate class imbalance, we incorporate adaptive focal attention loss (AFAL), which dynamically emphasizes challenging and rare instances, ensuring improved accuracy on difficult samples. Furthermore, SSL-MCE adopts a dynamic loss-based weighting scheme to prioritize more reliable classifiers in the final prediction. Extensive experiments conducted on public benchmark datasets, including PACS and DomainNet, indicate that SSL-MCE outperforms state-of-the-art methods, achieving superior generalization and resource efficiency through its streamlined ensemble framework.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70098","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bach-Thanh Lieu, Chi-Khang Nguyen, Huynh-Lam Nguyen, Thanh-Hai Le
{"title":"Enhanced Small-Object Detection in UAV Images Using Modified YOLOv5 Model","authors":"Bach-Thanh Lieu, Chi-Khang Nguyen, Huynh-Lam Nguyen, Thanh-Hai Le","doi":"10.1049/ipr2.70121","DOIUrl":"https://doi.org/10.1049/ipr2.70121","url":null,"abstract":"<p>This study presents a modified YOLOv5 algorithm specifically designed to enhance small-object detection in unmanned aerial vehicle (UAV) images. Traditional object detection in UAV images is particularly challenging due to the high altitude of the cameras, which results in small object sizes and varying viewing angles. To address these challenges, the algorithm incorporates an additional prediction head to detect objects across a wide range of scales, a channel feature fusion with involution (CFFI) block to minimize information loss, a convolutional block attention module (CBAM) to highlight the crucial spatial and channel features, and a C3 structure with a Transformer block (C3TR) to capture contextual information. The algorithm additionally employs soft non-maximum suppression to enhance the bounding box scoring of overlapping objects in dense scenes. Extensive experiments were conducted on the VisDrone-DET2019 dataset, which demonstrated the effectiveness of the proposed algorithm. The results showed improvements with precision scores of 55.0%, recall scores of 44.6%, mean average precision scores of mAP50 = 50.9% and mAP50:95 = 33.0% on the VisDrone-DET2019 validation set, and precision of 50.8%, recall of 37.3%, mAP50 = 44.2%, and mAP50:95 = 27.3% on the VisDrone-DET2019 testing set. The improved performance is due to the incorporation of attention mechanisms, which allow the proposed model to stay lightweight while still extracting the features needed to detect small objects.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70121","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144191091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Medical Image Registration via Spatial Feature Extraction Mamba and Substrate Iterative Refinement","authors":"Zilong Xue, Kangjian He, Dan Xu, Jian Gong","doi":"10.1049/ipr2.70117","DOIUrl":"https://doi.org/10.1049/ipr2.70117","url":null,"abstract":"<p>One of the major challenges in medical image registration is balancing computational efficiency with the ability to capture large deformations in complex anatomical structures. Existing methods often struggle with high computational costs due to the need for extensive feature extraction and attention computations at various levels of the network. Moreover, some methods do not take into account the spatial relationships of the feature images during registration, and the loss of these spatial relationships leads to suboptimal results for these methods. To this end, we introduce a novel medical image registration network, PSMamba-Net, which leverages optimized iteration and the Mamba framework within a dual-stream pyramid architecture. The network reduces the computational burden by narrowing attention computations at each decoding level, while an optimized iterative registration module at the bottom of the pyramid captures large deformations. This approach eliminates the need for repeated feature extraction, significantly accelerating the registration process. Additionally, the SMB module is incorporated as a decoder to enhance spatial relationship modelling and leverage Mamba's strengths in long-sequence processing. PSMamba-Net balances efficiency and accuracy, surpassing state-of-the-art methods across LPBA40, Mindboggle, and Abdomen CT datasets. Our source code is available at: https://github.com/VCMHE/PSMamba.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70117","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144179339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Facial Expression Recognition Method Based on Improved ConvNeXt","authors":"Dan Chen, Yu Cao, Xu Cheng","doi":"10.1049/ipr2.70118","DOIUrl":"https://doi.org/10.1049/ipr2.70118","url":null,"abstract":"<p>Advanced facial expression recognition technology can significantly enhance human-computer interaction and improve intelligent services for humans. This paper introduces a novel facial expression recognition method utilizing an enhanced ConvNeXt network. By integrating the SENET attention mechanism into the ConvNeXt block, key feature information extraction is effectively enhanced. Additionally, the incorporation of the focal loss (FL) function optimizes the classification performance of the network model. Experimental results show that the improved ConvNeXt network achieves higher accuracy compared to other deep learning models, with accuracy rates of 83.8% and 70.4% on the RAF-DB and FER2013 datasets, respectively.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70118","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain Adaptation of Foreground and Scale Sensing for Gastric Polyp Detection","authors":"Ying Zheng, Junhe Zhang, Yao Yu, Changyin Sun","doi":"10.1049/ipr2.70092","DOIUrl":"https://doi.org/10.1049/ipr2.70092","url":null,"abstract":"<p>Automated detection of gastric polyps has been proven crucial for improving diagnostic accuracy. However, when there is a domain shift in the data, deep learning-based detection methods may not perform well. Unsupervised domain adaptation has been demonstrated as a good approach to address this issue. However, existing unsupervised domain adaptation detection methods struggle to handle the problem of foreground–background similarity and the diverse appearances of polyps at different scales in gastric polyp images. In this paper, we propose a boundary-guided transferable attention module and a transferable prototype alignment module to address the foreground–background similarity issue, and a multi-scale enhanced alignment method to tackle the problem of information loss when aligning polyps at multiple scales. The boundary-guided transferable attention module fully explores spatial information of the image with a boundary-guided multi-field attention mechanism while considering the transferability of features to mine the easily transferable foreground regions. The transferable prototype alignment module adopts a prototype-based method to facilitate the transfer of difficult-to-align regions. The multi-scale enhanced alignment method prevents information loss across feature maps and scales with an attention filtering module, enhancing features at each scale. In experiments, this work outperforms advanced domain adaptation detection methods like SIGMA and CAT in polyp detection.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70092","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GAIP-VAE: Balancing Reconstruction and Disentanglement in VAE With Group and Individual Priors","authors":"Yi Tian, Zengjie Song","doi":"10.1049/ipr2.70113","DOIUrl":"https://doi.org/10.1049/ipr2.70113","url":null,"abstract":"<p>Disentangled representation learning demonstrates great success in enhancing the explainability, robustness and generalization capability of models across computer vision domains. While adopting the variational auto-encoders (VAEs) to learn disentangled representations holds great promise, these models are prone to suffer from the poor disentanglement capability in complicated datasets, for example, colourful portrait images. These datasets often contain strong correlation among attributes, making it difficult to disentangle them. To alleviate this issue, a novel approach named group and individual priors-based VAE (GAIP-VAE) is proposed, which constrains the semantic attributes by customizing prior information to improve the disentanglement capability of the VAE. Specifically, we start from modelling the joint distribution of the observed data, and then derive three compatible loss terms in the objective function. The first one is the reconstruction term, utilizing the Laplace distribution to improve the image quality. The second one is the individual prior regularizer, encouraging the model to learn more interpretable factors via dimensional-level regularizer. The third one is the group prior regularizer, constraining the approximate posterior distribution through multivariate normal distribution with the tailored correlation. Both quantitative and qualitative experimental results demonstrate that GAIP-VAE can achieve a great balance between image quality and disentanglement capability.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70113","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vorticity Transport Equation-Based Shadow Removal Approach for Image Inpainting","authors":"Xiaoying Ti, Li Yu, Quanhua Zhao","doi":"10.1049/ipr2.70114","DOIUrl":"https://doi.org/10.1049/ipr2.70114","url":null,"abstract":"<p>Shadows are common in many types of images, causing information loss or disturbance. Shadow removal can help improve the quality of the digital image. If there is no effective information available to restore the original image in the shaded area, the interpolation-based inpainting technique can be used to remove the shadow from the digital image. This image inpainting technique typically involves establishing and solving partial differential equations (PDEs), an iterative solving process that is very time-consuming. To solve the time-consuming problem, a method that introduces the fast marching method (FMM) into the vorticity transport equation (VTE) is demonstrated. VTE is a type of partial differential equation describing two-dimensional fluids. FMM is a numerical scheme for tracking the evolution of monotonically advancing interfaces via finite difference solution of the eikonal equation. The proposed method contains three main steps: (a) by investigating the relationship between VTE and the traditional PDE-based image inpainting method, a new image inpainting model using VTE is developed;(b) the area to be inpainted is divided into boundaries that shrink in layers from the outside inwards using FMM; and (c) the VTE image inpainting model is converted into a weighted average form to coordinate with FMM. The visual and quantitative evaluation of the experimental results of shadow removal shows that the proposed method outperforms PDE-based and state-of-the-art methods in terms of shadow-removal effect and running time. The results also show that our method excels at inpainting images with near-smooth textures and simple geometric structures and where the pixels to be inpainted are continuous with neighbouring pixels.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Agbo Tettey Soli, Dacosta Agyei, Waliyyullah Umar Bandawu, Leonard Mensah Boante, Justice Kwame Appati
{"title":"A Modified Hierarchical Vision Transformer Model for Poultry Disease Detection","authors":"Michael Agbo Tettey Soli, Dacosta Agyei, Waliyyullah Umar Bandawu, Leonard Mensah Boante, Justice Kwame Appati","doi":"10.1049/ipr2.70115","DOIUrl":"https://doi.org/10.1049/ipr2.70115","url":null,"abstract":"<p>Poultry production faces challenges from diseases like newcastle, salmonella, and coccidiosis, which are critical to global food security, resulting in economic losses and public health concerns. Current detection technologies, such as human inspections and PCR-based procedures, are time-consuming and costly, limiting scalability. Convolutional neural networks (CNNs) like ResNet50 and VGG16 have shown promise for automating disease identification, but they struggle with generalization and collecting fine-grained local and global information. In this study, we propose a deep learning solution based on a hierarchical vision transformer (HViT) model to detect poultry diseases from fecal images. We compare the performance of our HViT model with traditional CNNs (ResNet50, VGG16), lightweight architectures (MobileNetV3_Large_100, XceptionNet), and standard vision transformers (ViT) (ViT-B/16). The experimental results demonstrate that our HViT model outperforms other models, achieving an average validation accuracy of 90.90% with a validation loss of 0.2647. The HViT's ability to balance local and global feature recognition highlights its potential as a scalable solution for real-time poultry disease detection. These findings underscore the significance of hierarchical attention in addressing complex image analysis tasks, with implications for broader applications in agriculture and medical imaging.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70115","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144135672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Quantum Image Encryption With QLSTM and Chaos Synchronisation Control: A Deep Neural Network Approach","authors":"Yuebo Wu, Duansong Wang, Jian Zhou, Huifang Bao","doi":"10.1049/ipr2.70091","DOIUrl":"https://doi.org/10.1049/ipr2.70091","url":null,"abstract":"<p>Quantum image encryption is crucial for data protection, but current methods lack attack resistance and have complex encryption processes. This paper proposes a quantum long short - term memory (QLSTM)-based quantum image encryption method to enhance chaotic sequences and achieve chaos synchronisation control. The QLSTM network improves the Lorenz chaotic sequence, increasing its unpredictability. An adaptive synchronisation control algorithm, using the enhanced chaotic sequence from QLSTM, ensures sender-receiver synchronization. Optimised through deep neural networks, the system maintains stable synchronization under interference. New cryptographic quantum infrastructure (NCQI) was constructed, and images were encrypted using third-order radial diffusion, quantum generalised Arnold transform, and quantum W transform. The QLSTM-improved chaotic sequence showed excellent LLE and 0–1 test results. Information entropy was near 8, with R, G and B channels exceeding 7.999. Anti-attack analysis revealed high information entropy, strong attack resistance, and number of pixels change rate/unified average changing intensity (NPCR/UACI) values of 99.698% and 33.460%, respectively, indicating significant pixel-level changes. Combining quantum chaotic system prediction with the QLSTM model enhanced quantum communication stability and anti-interference ability. This QLSTM-based quantum encryption method, with chaos synchronisation control, significantly improves encryption security and reliability, maintaining high information entropy and complexity under attacks, proving its effectiveness in image encryption.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70091","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ACDR-CRAFF Net: A Multi-Scale Network Based on Adaptive Channel and Coordinate Relational Attention Network for Remote Sensing Scene Classification","authors":"Wei Dai, Haixia Xu, Furong Shi, Liming Yuan, Xinyu Wang, Xianbin Wen","doi":"10.1049/ipr2.70112","DOIUrl":"https://doi.org/10.1049/ipr2.70112","url":null,"abstract":"<p>Accurate classification of remote sensing scene images is crucial for diverse applications, from environmental monitoring to urban planning. While convolutional neural networks (CNNs) have dramatically improved classification accuracy, challenges remain due to the complex distribution of small objects, varied spatial configurations, and intra-class multimodality in remote sensing images. In this work, we make three key contributions to address these challenges. (1) We propose the adaptive channel and coordinate relational attention network (ACDR-CRAFF), a novel multi-scale feature fusion framework designed to enhance feature representation across scales. (2) We introduce two innovative modules: the adaptive channel dimensionality reduction (ACDR) module, which dynamically adjusts channel representations to retain essential low-dimensional features, and the coordinate relational attention multi-scale feature fusion (CRAFF) module, which effectively captures and transfers spatial information between feature levels. (3) By integrating ACDR and CRAFF, our model achieves a progressive fusion of local to global features, ensuring robust feature expressiveness at multiple scales. Experimental results on four widely used benchmark datasets demonstrate that ACDR-CRAFF consistently outperforms several state-of-the-art methods, achieving significant improvements in classification accuracy and setting a new benchmark for complex remote sensing scene classification tasks. These results underscore the effectiveness of our approach in addressing the limitations of existing methods and advancing the state of the art in remote sensing image analysis.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70112","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}