{"title":"Improved Perceptual Loss for Sketch Image Domain.","authors":"Chang Wook Seo","doi":"10.3390/jimaging11090323","DOIUrl":"10.3390/jimaging11090323","url":null,"abstract":"<p><p>Traditional perceptual loss functions, primarily designed for photographic images, often perform poorly in the sketch domain due to significant differences in visual representation. To address this domain gap, we propose an improved perceptual loss specifically designed for sketch images. Our method fine-tunes a pre-trained VGG-16 model on the ImageNet-Sketch dataset while strategically replacing max-pooling layers with spatial and channel attention mechanisms. We comprehensively evaluate our approach across three dimensions: generation quality, sketch retrieval performance, and feature space organization. Experimental results demonstrate consistent improvements across all evaluation metrics, with our method achieving the best generation performance, over 10% improvement in sketch retrieval accuracy, and 6-fold improvement in class separability compared to baseline methods. The ablation studies confirm that both fine-tuning and attention mechanisms are essential components that work together effectively. Our domain-specific perceptual loss effectively bridges the gap between photographic and sketch domains, providing enhanced performance for various sketch-related computer vision applications, including generation, retrieval, and recognition.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470351/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Fogante, Paolo Esposto Pirani, Fatjon Cela, Enrico Paolini, Liliana Balardi, Nicolò Schicchi
{"title":"Combined CCTA and Stress CTP for Anatomical and Functional Assessment of Myocardial Bridges.","authors":"Marco Fogante, Paolo Esposto Pirani, Fatjon Cela, Enrico Paolini, Liliana Balardi, Nicolò Schicchi","doi":"10.3390/jimaging11090324","DOIUrl":"10.3390/jimaging11090324","url":null,"abstract":"<p><p>Myocardial bridging (MB) is a congenital coronary anomaly whose clinical impact remains controversial. Coronary computed tomography angiography (CCTA) combined with CT myocardial perfusion imaging (CT-MPI) enables a comprehensive anatomical and functional assessment of MB. This study aimed to investigate whether specific high-risk anatomical features of MB are independently associated with myocardial hypoperfusion, using combined CCTA and CT-MPI. We retrospectively analyzed 81 patients with MB showing high-risk anatomical features (depth ≥ 2.0 mm and length ≥ 25 mm) identified by CCTA, all of whom underwent stress dynamic CT-MPI between May 2022 and December 2025. Patients were classified according to the presence or absence of hypoperfusion in MB-related myocardial segments. Clinical and anatomical variables were compared between two groups using non-parametric tests, and multivariable logistic regression was performed to identify independent predictors of hypoperfusion. Among the 81 patients (mean age, 59.3 ± 11.7 years; 54 males), 26 (32.1%) demonstrated perfusion defects. All MBs were located in the left anterior descending artery (LAD). No significant differences were observed in clinical variables between groups. Bridges associated with hypoperfusion were significantly deeper (<i>p</i> < 0.001) and were more frequently located in the mid-LAD (73.1% vs. 38.2%, <i>p</i> = 0.01). In multivariable analysis, bridge depth and mid-LAD location remained independent predictors of hypoperfusion. In patients with MB, greater depth and mid-LAD location are independently associated with myocardial hypoperfusion. The combined use of CCTA and CT-MPI may enhance risk stratification and help guide clinical decision-making in this patient population.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470479/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos Keremis, Eleni Vrochidou, George A Papakostas
{"title":"Empirical Evaluation of Invariances in Deep Vision Models.","authors":"Konstantinos Keremis, Eleni Vrochidou, George A Papakostas","doi":"10.3390/jimaging11090322","DOIUrl":"10.3390/jimaging11090322","url":null,"abstract":"<p><p>The ability of deep learning models to maintain consistent performance under image transformations-termed invariances, is critical for reliable deployment across diverse computer vision applications. This study presents a comprehensive empirical evaluation of modern convolutional neural networks (CNNs) and vision transformers (ViTs) concerning four fundamental types of image invariances: blur, noise, rotation, and scale. We analyze a curated selection of thirty models across three common vision tasks, object localization, recognition, and semantic segmentation, using benchmark datasets including COCO, ImageNet, and a custom segmentation dataset. Our experimental protocol introduces controlled perturbations to test model robustness and employs task-specific metrics such as mean Intersection over Union (mIoU), and classification accuracy (Acc) to quantify models' performance degradation. Results indicate that while ViTs generally outperform CNNs under blur and noise corruption in recognition tasks, both model families exhibit significant vulnerabilities to rotation and extreme scale transformations. Notably, segmentation models demonstrate higher resilience to geometric variations, with SegFormer and Mask2Former emerging as the most robust architectures. These findings challenge prevailing assumptions regarding model robustness and provide actionable insights for designing vision systems capable of withstanding real-world input variability.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Invasive Retinal Pathology Assessment Using Haralick-Based Vascular Texture and Global Fundus Color Distribution Analysis.","authors":"Ouafa Sijilmassi","doi":"10.3390/jimaging11090321","DOIUrl":"10.3390/jimaging11090321","url":null,"abstract":"<p><p>This study analyzes retinal fundus images to distinguish healthy retinas from those affected by diabetic retinopathy (DR) and glaucoma using a dual-framework approach: vascular texture analysis and global color distribution analysis. The texture-based approach involved segmenting the retinal vasculature and extracting eight Haralick texture features from the Gray-Level Co-occurrence Matrix. Significant differences in features such as energy, contrast, correlation, and entropy were found between healthy and pathological retinas. Pathological retinas exhibited lower textural complexity and higher uniformity, which correlates with vascular thinning and structural changes observed in DR and glaucoma. In parallel, the global color distribution of the full fundus area was analyzed without segmentation. RGB intensity histograms were calculated for each channel and averaged across groups. Statistical tests revealed significant differences, particularly in the green and blue channels. The Mahalanobis distance quantified the separability of the groups per channel. These results indicate that pathological changes in retinal tissue can also lead to detectable chromatic shifts in the fundus. The findings underscore the potential of both vascular texture and color features as non-invasive biomarkers for early retinal disease detection and classification.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12471138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"YOLO-DCRCF: An Algorithm for Detecting the Wearing of Safety Helmets and Gloves in Power Grid Operation Environments.","authors":"Jinwei Zhao, Zhi Yang, Baogang Li, Yubo Zhao","doi":"10.3390/jimaging11090320","DOIUrl":"10.3390/jimaging11090320","url":null,"abstract":"<p><p>Safety helmets and gloves are indispensable personal protective equipment in power grid operation environments. Traditional detection methods for safety helmets and gloves suffer from reduced accuracy due to factors such as dense personnel presence, varying lighting conditions, occlusions, and diverse postures. To enhance the detection performance of safety helmets and gloves in power grid operation environments, this paper proposes a novel algorithm, YOLO-DCRCF, based on YOLO11 for detecting the wearing of safety helmets and gloves in such settings. By integrating Deformable Convolutional Network version 2 (DCNv2), the algorithm enhances the network's capability to model geometric transformations. Additionally, a recalibration feature pyramid (RCF) network is innovatively designed to strengthen the interaction between shallow and deep features, enabling the network to capture multi-scale information of the target. Experimental results show that the proposed YOLO-DCRCF model achieved mAP50 scores of 92.7% on the Safety Helmet Wearing Dataset (SHWD) and 79.6% on the Safety Helmet and Gloves Wearing Dataset (SHAGWD), surpassing the baseline YOLOv11 model by 1.1% and 2.7%, respectively. These results meet the real-time safety monitoring requirements of power grid operation sites.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470609/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic-Enhanced and Temporally Refined Bidirectional BEV Fusion for LiDAR-Camera 3D Object Detection.","authors":"Xiangjun Qu, Kai Qin, Yaping Li, Shuaizhang Zhang, Yuchen Li, Sizhe Shen, Yun Gao","doi":"10.3390/jimaging11090319","DOIUrl":"10.3390/jimaging11090319","url":null,"abstract":"<p><p>In domains such as autonomous driving, 3D object detection is a key technology for environmental perception. By integrating multimodal information from sensors such as LiDAR and cameras, the detection accuracy can be significantly improved. However, the current multimodal fusion perception framework still suffers from two problems: first, due to the inherent physical limitations of LiDAR detection, the number of point clouds of distant objects is sparse, resulting in small target objects being easily overwhelmed by the background; second, the cross-modal information interaction is insufficient, and the complementarity and correlation between the LiDAR point cloud and the camera image are not fully exploited and utilized. Therefore, we propose a new multimodal detection strategy, Semantic-Enhanced and Temporally Refined Bidirectional BEV Fusion (SETR-Fusion). This method integrates three key components: the Discriminative Semantic Saliency Activation (DSSA) module, the Temporally Consistent Semantic Point Fusion (TCSP) module, and the Bilateral Cross-Attention Fusion (BCAF) module. The DSSA module fully utilizes image semantic features to capture more discriminative foreground and background cues; the TCSP module generates semantic LiDAR points and, after noise filtering, produces a more accurate semantic LiDAR point cloud; and the BCAF module's cross-attention to camera and LiDAR BEV features in both directions enables strong interaction between the two types of modal information. SETR-Fusion achieves 71.2% mAP and 73.3% NDS values on the nuScenes test set, outperforming several state-of-the-art methods.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470275/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahdi Islam, Musarrat Tabassum, Agnes Mayr, Christian Kremser, Markus Haltmeier, Enrique Almar-Munoz
{"title":"Uncertainty-Guided Active Learning for Access Route Segmentation and Planning in Transcatheter Aortic Valve Implantation.","authors":"Mahdi Islam, Musarrat Tabassum, Agnes Mayr, Christian Kremser, Markus Haltmeier, Enrique Almar-Munoz","doi":"10.3390/jimaging11090318","DOIUrl":"10.3390/jimaging11090318","url":null,"abstract":"<p><p>Transcatheter aortic valve implantation (TAVI) is a minimally invasive procedure for treating severe aortic stenosis, where optimal vascular access route selection is critical to reduce complications. It requires careful selection of the iliac artery with the most favourable anatomy, specifically, one with the largest diameters and no segments narrower than 5 mm. This process is time-consuming when carried out manually. We present an active learning-based segmentation framework for contrast-enhanced Cardiac Magnetic Resonance (CMR) data, guided by probabilistic uncertainty and pseudo-labelling, enabling efficient segmentation with minimal manual annotation. The segmentations are then fed into an automated pipeline for diameter quantification, achieving a Dice score of 0.912 and a mean absolute percentage error (MAPE) of 4.92%. An ablation study using pre- and post-contrast CMR showed superior performance with post-contrast data only. Overall, the pipeline provides accurate segmentation and detailed diameter profiles of the aorto-iliac route, helping the assessment of the access route.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12471150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative Evaluation of Low-Dose CT Image Quality Using Deep Learning Reconstruction: A Comparative Study of Philips Precise Image and GE TrueFidelity.","authors":"Jina Shim, Youngjin Lee, Kyuseok Kim","doi":"10.3390/jimaging11090317","DOIUrl":"10.3390/jimaging11090317","url":null,"abstract":"<p><p>Reducing radiation exposure in CT imaging is critical, particularly in routine and repeat examinations. Deep learning image reconstruction (DLIR) has emerged as a key approach for maintaining diagnostic quality at low-dose acquisition settings. This study compared two DLIR algorithms of Philips Precise Image (PI) and GE TrueFidelity (TF) under an 80 kVp low-dose CT scenario, using the AAPM CIRS-610 phantom to replicate clinical imaging conditions. The phantom's linearity, high-resolution, and artifact modules were scanned with Philips CT 5300 and GE Revolution CT scanners at low-dose parameters. Images were reconstructed using five DLIR presets, including PI (Smoother, Standard, Sharper) and TF (Middle, High), and evaluated with eight quantitative metrics, including SNR, CNR, nRMSE, PSNR, SSIM, FSIM, UQI, GMSD, and gradient magnitude. TF-High delivered the highest SNR (115.0-118.0 across modules), representing a 54-57% improvement over PI-Smoother, and achieved superior PSNR and the lowest GMSD, reflecting better preservation of structure in low-dose images. PI-Sharper provided the strongest gradient magnitude, emphasizing fine edge detail. Under low-dose CT conditions, TF-High offered the optimal balance of noise suppression and structure fidelity, while PI-Sharper highlighted fine detail enhancement. These findings show that DLIR settings must be tailored to clinical needs when operating under low-dose imaging protocols.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470537/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segment and Recover: Defending Object Detectors Against Adversarial Patch Attacks.","authors":"Haotian Gu, Hamidreza Jafarnejadsani","doi":"10.3390/jimaging11090316","DOIUrl":"10.3390/jimaging11090316","url":null,"abstract":"<p><p>Object detection is used to automatically identify and locate specific objects within images or videos for applications like autonomous driving, security surveillance, and medical imaging. Protecting object detection models against adversarial attacks, particularly malicious patches, is crucial to ensure reliable and safe performance in safety-critical applications, where misdetections can lead to severe consequences. Existing defenses against patch attacks are primarily designed for stationary scenes and struggle against adversarial image patches that vary in scale, position, and orientation in dynamic environments.In this paper, we introduce SAR, a patch-agnostic defense scheme based on image preprocessing that does not require additional model training. By integration of the patch-agnostic detection frontend with an additional broken pixel restoration backend, Segment and Recover (SAR) is developed for the large-mask-covered object-hiding attack. Our approach breaks the limitation of the patch scale, shape, and location, accurately localizes the adversarial patch on the frontend, and restores the broken pixel on the backend. Our evaluations of the clean performance demonstrate that SAR is compatible with a variety of pretrained object detectors. Moreover, SAR exhibits notable resilience improvements over state-of-the-art methods evaluated in this paper. Our comprehensive evaluation studies involve diverse patch types, such as localized-noise, printable, visible, and adaptive adversarial patches.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12470975/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenza Chenni, Carlos Brito-Loeza, Cefa Karabağ, Lavdie Rada
{"title":"From Detection to Motion-Based Classification: A Two-Stage Approach for <i>T. cruzi</i> Identification in Video Sequences.","authors":"Kenza Chenni, Carlos Brito-Loeza, Cefa Karabağ, Lavdie Rada","doi":"10.3390/jimaging11090315","DOIUrl":"10.3390/jimaging11090315","url":null,"abstract":"<p><p>Chagas disease, caused by <i>Trypanosoma cruzi</i> (<i>T. cruzi</i>), remains a significant public health challenge in Latin America. Traditional diagnostic methods relying on manual microscopy suffer from low sensitivity, subjective interpretation, and poor performance in suboptimal conditions. This study presents a novel computer vision framework integrating motion analysis with deep learning for automated <i>T. cruzi</i> detection in microscopic videos. Our motion-based detection pipeline leverages parasite motility as a key discriminative feature, employing frame differencing, morphological processing, and DBSCAN clustering across 23 microscopic videos. This approach effectively addresses limitations of static image analysis in challenging conditions including noisy backgrounds, uneven illumination, and low contrast. From motion-identified regions, 64×64 patches were extracted for classification. MobileNetV2 achieved superior performance with 99.63% accuracy, 100% precision, 99.12% recall, and an AUC-ROC of 1.0. Additionally, YOLOv5 and YOLOv8 models (Nano, Small, Medium variants) were trained on 43 annotated videos, with YOLOv5-Nano and YOLOv8-Nano demonstrating excellent detection capability on unseen test data. This dual-stage framework offers a practical, computationally efficient solution for automated Chagas diagnosis, particularly valuable for resource-constrained laboratories with poor imaging quality.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 9","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12471015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}