{"title":"Test-Time Augmentation for Cross-Domain Leukocyte Classification via OOD Filtering and Self-Ensembling.","authors":"Lorenzo Putzu, Andrea Loddo, Cecilia Di Ruberto","doi":"10.3390/jimaging11090295","DOIUrl":"10.3390/jimaging11090295","url":null,"abstract":"<p><p>Domain shift poses a major challenge in many Machine Learning applications due to variations in data acquisition protocols, particularly in the medical field. Test-time augmentation (TTA) can solve the domain shift issue and improve robustness by aggregating predictions from multiple augmented versions of the same input. However, TTA may inadvertently generate unrealistic or Out-of-Distribution (OOD) samples that negatively affect prediction quality. In this work, we introduce a filtering procedure that removes from the TTA images all the OOD samples whose representations lie far from the training data distribution. Moreover, all the retained TTA images are weighted inversely to their distance from the training data. The final prediction is provided by a Self-Ensemble with Confidence, which is a lightweight ensemble strategy that fuses predictions from the original and retained TTA samples using a weighted soft voting scheme, without requiring multiple models or retraining. This method is model-agnostic and can be integrated with any deep learning architecture, making it broadly applicable across various domains. Experiments on cross-domain leukocyte classification benchmarks demonstrate that our method consistently improves over standard TTA and Baseline inference, particularly when strong domain shifts are present. Ablation studies and statistical tests confirm the effectiveness and significance of each component.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470409/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Colorectal Polyp Segmentation Based on Deep Learning Methods: A Systematic Review.","authors":"Xin Liu, Nor Ashidi Mat Isa, Chao Chen, Fajin Lv","doi":"10.3390/jimaging11090293","DOIUrl":"10.3390/jimaging11090293","url":null,"abstract":"<p><p>Colorectal cancer is one of the three most common cancers worldwide. Early detection and assessment of polyps can significantly reduce the risk of developing colorectal cancer. Physicians can obtain information about polyp regions through polyp segmentation techniques, enabling the provision of targeted treatment plans. This study systematically reviews polyp segmentation methods. We investigated 146 papers published between 2018 and 2024 and conducted an in-depth analysis of the methodologies employed. Based on the selected literature, we systematically organized this review. First, we analyzed the development and evolution of the polyp segmentation field. Second, we provided a comprehensive overview of deep learning-based polyp image segmentation methods and the Mamba method, as well as video polyp segmentation methods categorized by network architecture, addressing the challenges faced in polyp segmentation. Subsequently, we evaluated the performance of 44 models, including segmentation performance metrics and real-time analysis capabilities. Additionally, we introduced commonly used datasets for polyp images and videos, along with metrics for assessing segmentation models. Finally, we discussed existing issues and potential future trends in this area.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470534/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"E-CMCA and LSTM-Enhanced Framework for Cross-Modal MRI-TRUS Registration in Prostate Cancer.","authors":"Ciliang Shao, Ruijin Xue, Lixu Gu","doi":"10.3390/jimaging11090292","DOIUrl":"10.3390/jimaging11090292","url":null,"abstract":"<p><p>Accurate registration of MRI and TRUS images is crucial for effective prostate cancer diagnosis and biopsy guidance, yet modality differences and non-rigid deformations pose significant challenges, especially in dynamic imaging. This study presents a novel cross-modal MRI-TRUS registration framework, leveraging a dual-encoder architecture with an Enhanced Cross-Modal Channel Attention (E-CMCA) module and a LSTM-Based Spatial Deformation Modeling Module. The E-CMCA module efficiently extracts and integrates multi-scale cross-modal features, while the LSTM-Based Spatial Deformation Modeling Module models temporal dynamics by processing depth-sliced 3D deformation fields as sequential data. A VecInt operation ensures smooth, diffeomorphic transformations, and a FuseConv layer enhances feature integration for precise alignment. Experiments on the μ-RegPro dataset from the MICCAI 2023 Challenge demonstrate that our model significantly improves registration accuracy and performs robustly in both static 3D and dynamic 4D registration tasks. Experiments on the μ-RegPro dataset from the MICCAI 2023 Challenge demonstrate that our model achieves a DSC of 0.865, RDSC of 0.898, TRE of 2.278 mm, and RTRE of 1.293, surpassing state-of-the-art methods and performing robustly in both static 3D and dynamic 4D registration tasks.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12471084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huazhong Zhang, Jiaozhuo Wang, Xiaoguang Tu, Zhiyi Niu, Yu Wang
{"title":"Contrastive Learning-Driven Image Dehazing with Multi-Scale Feature Fusion and Hybrid Attention Mechanism.","authors":"Huazhong Zhang, Jiaozhuo Wang, Xiaoguang Tu, Zhiyi Niu, Yu Wang","doi":"10.3390/jimaging11090290","DOIUrl":"10.3390/jimaging11090290","url":null,"abstract":"<p><p>Image dehazing is critical for visual enhancement and a wide range of computer vision applications. Despite significant advancements, challenges remain in preserving fine details and adapting to diverse, non-uniformly degraded scenes. To address these issues, we propose a novel image dehazing method that introduces a contrastive learning framework, enhanced by the InfoNCE loss, to improve model robustness. In this framework, hazy images are treated as negative samples and their clear counterparts as positive samples. By optimizing the InfoNCE loss, the model is trained to maximize the similarity between positive pairs and minimize that between negative pairs, thereby improving its ability to distinguish haze artifacts from intrinsic scene features and better preserving the structural integrity of images. In addition to contrastive learning, our method integrates a multi-scale dynamic feature fusion with a hybrid attention mechanism. Specifically, we introduce dynamically adjustable frequency band filters and refine the hybrid attention module to more effectively capture fine-grained, cross-scale image details. Extensive experiments on the RESIDE-6K and RS-Haze datasets demonstrate that our approach outperforms most existing methods, offering a promising solution for practical image dehazing applications.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470370/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio Rienzo, Miguel Bustamante, Ricardo Staub, Gastón Lefranc
{"title":"Dual-Filter X-Ray Image Enhancement Using Cream and Bosso Algorithms: Contrast and Entropy Optimization Across Anatomical Regions.","authors":"Antonio Rienzo, Miguel Bustamante, Ricardo Staub, Gastón Lefranc","doi":"10.3390/jimaging11090291","DOIUrl":"10.3390/jimaging11090291","url":null,"abstract":"<p><p>This study introduces a dual-filter X-ray image enhancement technique designed to elevate the quality of radiographic images of the knee, breast, and wrist, employing the Cream and Bosso algorithms. Our quantitative analysis reveals significant improvements in bone, edge definition, and contrast (<i>p</i> < 0.001). The processing parameters are derived from the relationship between entropy metrics and the filtering parameter d. The results demonstrate contrast enhancements for knee radiographs and for wrist radiographs, while maintaining acceptable noise levels. Comparisons are made with CLAHE techniques, unsharp masking, and deep-learning-based models. This method is a reliable and computationally efficient approach to enhancing clinical diagnosis in resource-limited settings, thereby improving robustness and interpretability.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470700/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Fragment to One Piece: A Review on AI-Driven Graphic Design.","authors":"Xingxing Zou, Wen Zhang, Nanxuan Zhao","doi":"10.3390/jimaging11090289","DOIUrl":"10.3390/jimaging11090289","url":null,"abstract":"<p><p>This survey offers a comprehensive overview of advancements in Artificial Intelligence in Graphic Design (AIGD), with a focus on the integration of AI techniques to enhance design interpretation and creative processes. The field is categorized into two primary directions: perception tasks, which involve understanding and analyzing design elements, and generation tasks, which focus on creating new design elements and layouts. The methodology emphasizes the exploration of various subtasks including the perception and generation of visual elements, aesthetic and semantic understanding, and layout analysis and generation. The survey also highlights the role of large language models and multimodal approaches in bridging the gap between localized visual features and global design intent. Despite significant progress, challenges persist in understanding human intent, ensuring interpretability, and maintaining control over multilayered compositions. This survey aims to serve as a guide for researchers, detailing the current state of AIGD and outlining potential future directions.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanjeetha Pennada, Jack McAlorum, Marcus Perry, Hamish Dow, Gordon Dobie
{"title":"Directional Lighting-Based Deep Learning Models for Crack and Spalling Classification.","authors":"Sanjeetha Pennada, Jack McAlorum, Marcus Perry, Hamish Dow, Gordon Dobie","doi":"10.3390/jimaging11090288","DOIUrl":"10.3390/jimaging11090288","url":null,"abstract":"<p><p>External lighting is essential for autonomous inspections of concrete structures in low-light environments. However, previous studies have primarily relied on uniformly diffused lighting to illuminate images and faced challenges in detecting complex crack patterns. This paper proposes two novel algorithms that use directional lighting to classify concrete defects. The first method, named fused neural network, uses the maximum intensity pixel-level image fusion technique and selects the maximum intensity pixel values from all directional images for each pixel to generate a fused image. The second proposed method, named multi-channel neural network, generates a five-channel image, with each channel representing the grayscale version of images captured in the Right (R), Down (D), Left (L), Up (U), and Diffused (A) directions, respectively. The proposed multi-channel neural network model achieved the best performance, with accuracy, precision, recall, and F1 score of 96.6%, 96.3%, 97%, and 96.6%, respectively. It also outperformed the FusedNet and other models found in the literature, with no significant change in evaluation time. The results from this work have the potential to improve concrete crack classification in environments where external illumination is required. Future research focuses on extending the concepts of multi-channel and image fusion to white-box techniques.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solar Panel Surface Defect and Dust Detection: Deep Learning Approach.","authors":"Atta Rahman","doi":"10.3390/jimaging11090287","DOIUrl":"10.3390/jimaging11090287","url":null,"abstract":"<p><p>In recent years, solar energy has emerged as a pillar of sustainable development. However, maintaining panel efficiency under extreme environmental conditions remains a persistent hurdle. This study introduces an automated defect detection pipeline that leverages deep learning and computer vision to identify five standard anomaly classes: Non-Defective, Dust, Defective, Physical Damage, and Snow on photovoltaic surfaces. To build a robust foundation, a heterogeneous dataset of 8973 images was sourced from public repositories and standardized into a uniform labeling scheme. This dataset was then expanded through an aggressive augmentation strategy, including flips, rotations, zooms, and noise injections. A YOLOv11-based model was trained and fine-tuned using both fixed and adaptive learning rate schedules, achieving a mAP@0.5 of 85% and accuracy, recall, and F1-score above 95% when evaluated across diverse lighting and dust scenarios. The optimized model is integrated into an interactive dashboard that processes live camera streams, issues real-time alerts upon defect detection, and supports proactive maintenance scheduling. Comparative evaluations highlight the superiority of this approach over manual inspections and earlier YOLO versions in both precision and inference speed, making it well suited for deployment on edge devices. Automating visual inspection not only reduces labor costs and operational downtime but also enhances the longevity of solar installations. By offering a scalable solution for continuous monitoring, this work contributes to improving the reliability and cost-effectiveness of large-scale solar energy systems.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470506/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Part-Wise Graph Fourier Learning for Skeleton-Based Continuous Sign Language Recognition.","authors":"Dong Wei, Hongxiang Hu, Gang-Feng Ma","doi":"10.3390/jimaging11080286","DOIUrl":"10.3390/jimaging11080286","url":null,"abstract":"<p><p>Sign language is a visual language articulated through body movements. Existing approaches predominantly leverage RGB inputs, incurring substantial computational overhead and remaining susceptible to interference from foreground and background noise. A second fundamental challenge lies in accurately modeling the nonlinear temporal dynamics and inherent asynchrony across body parts that characterize sign language sequences. To address these challenges, we propose a novel part-wise graph Fourier learning method for skeleton-based continuous sign language recognition (PGF-SLR), which uniformly models the spatiotemporal relations of multiple body parts in a globally ordered yet locally unordered manner. Specifically, different parts within different time steps are treated as nodes, while the frequency domain attention between parts is treated as edges to construct a part-level Fourier fully connected graph. This enables the graph Fourier learning module to jointly capture spatiotemporal dependencies in the frequency domain, while our adaptive frequency enhancement method further amplifies discriminative action features in a lightweight and robust fashion. Finally, a dual-branch action learning module featuring an auxiliary action prediction branch to assist the recognition branch is designed to enhance the understanding of sign language. Our experimental results show that the proposed PGF-SLR achieved relative improvements of 3.31%/3.70% and 2.81%/7.33% compared to SOTA methods on the dev/test sets of the PHOENIX14 and PHOENIX14-T datasets. It also demonstrated highly competitive recognition performance on the CSL-Daily dataset, showcasing strong generalization while reducing computational costs in both offline and online settings.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MSConv-YOLO: An Improved Small Target Detection Algorithm Based on YOLOv8.","authors":"Linli Yang, Barmak Honarvar Shakibaei Asli","doi":"10.3390/jimaging11080285","DOIUrl":"10.3390/jimaging11080285","url":null,"abstract":"<p><p>Small object detection in UAV aerial imagery presents significant challenges due to scale variations, sparse feature representation, and complex backgrounds. To address these issues, this paper focuses on practical engineering improvements to the existing YOLOv8s framework, rather than proposing a fundamentally new algorithm. We introduce MultiScaleConv-YOLO (MSConv-YOLO), an enhanced model that integrates well-established techniques to improve detection performance for small targets. Specifically, the proposed approach introduces three key improvements: (1) a MultiScaleConv (MSConv) module that combines depthwise separable and dilated convolutions with varying dilation rates, enhancing multi-scale feature extraction while maintaining efficiency; (2) the replacement of CIoU with WIoU v3 as the bounding box regression loss, which incorporates a dynamic non-monotonic focusing mechanism to improve localization for small targets; and (3) the addition of a high-resolution detection head in the neck-head structure, leveraging FPN and PAN to preserve fine-grained features and ensure full-scale coverage. Experimental results on the VisDrone2019 dataset show that MSConv-YOLO outperforms the baseline YOLOv8s by achieving a 6.9% improvement in mAP@0.5 and a 6.3% gain in recall. Ablation studies further validate the complementary impact of each enhancement. This paper presents practical and effective engineering enhancements to small object detection in UAV scenarios, offering an improved solution without introducing entirely new theoretical constructs. Future work will focus on lightweight deployment and adaptation to more complex environments.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}