Arjun Kumar Bose Arnob, Muhammad Hasibur Rashid Chayon, Fahmid Al Farid, Mohd Nizam Husen, Firoz Ahmed
{"title":"A Lightweight CNN for Multiclass Retinal Disease Screening with Explainable AI.","authors":"Arjun Kumar Bose Arnob, Muhammad Hasibur Rashid Chayon, Fahmid Al Farid, Mohd Nizam Husen, Firoz Ahmed","doi":"10.3390/jimaging11080275","DOIUrl":"10.3390/jimaging11080275","url":null,"abstract":"<p><p>Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all three barriers. The network combines depthwise separable convolutions, squeeze-and-excitation, and global-context attention, and it incorporates gradient-based class activation mapping (Grad-CAM) and Grad-CAM++ to ensure that every decision is accompanied by pixel-level evidence. A 5335-image ten-class color-fundus dataset from Bangladeshi clinics, which was severely skewed (17-1509 images per class), was equalized using a synthetic minority oversampling technique (SMOTE) and task-specific augmentations. Images were resized to 150×150 px and split 70:15:15. The training used the adaptive moment estimation (Adam) optimizer (initial learning rate of 1×10-4, reduce-on-plateau, early stopping), ℓ2 regularization, and dual dropout. The 16.6 M parameter network converged in fewer than 50 epochs on a mid-range graphics processing unit (GPU) and reached 87.9% test accuracy, a macro-precision of 0.882, a macro-recall of 0.879, and a macro-F1-score of 0.880, reducing the error by 58% relative to the best ImageNet backbone (Inception-V3, 40.4% accuracy). Eight disorders recorded true-positive rates above 95%; macular scar and central serous chorioretinopathy attained F1-scores of 0.77 and 0.89, respectively. Saliency maps consistently highlighted optic disc margins, subretinal fluid, and other hallmarks. Targeted class re-balancing, lightweight attention, and integrated explainability, therefore, deliver accurate, transparent, and deployable retinal screening suitable for point-of-care ophthalmic triage on resource-limited hardware.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387214/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Shahid Ahammed Shakil, Fahmid Al Farid, Nitun Kumar Podder, S M Hasan Sazzad Iqbal, Abu Saleh Musa Miah, Md Abdur Rahim, Hezerul Abdul Karim
{"title":"Bangla Speech Emotion Recognition Using Deep Learning-Based Ensemble Learning and Feature Fusion.","authors":"Md Shahid Ahammed Shakil, Fahmid Al Farid, Nitun Kumar Podder, S M Hasan Sazzad Iqbal, Abu Saleh Musa Miah, Md Abdur Rahim, Hezerul Abdul Karim","doi":"10.3390/jimaging11080273","DOIUrl":"10.3390/jimaging11080273","url":null,"abstract":"<p><p>Emotion recognition in speech is essential for enhancing human-computer interaction (HCI) systems. Despite progress in Bangla speech emotion recognition, challenges remain, including low accuracy, speaker dependency, and poor generalization across emotional expressions. Previous approaches often rely on traditional machine learning or basic deep learning models, struggling with robustness and accuracy in noisy or varied data. In this study, we propose a novel multi-stream deep learning feature fusion approach for Bangla speech emotion recognition, addressing the limitations of existing methods. Our approach begins with various data augmentation techniques applied to the training dataset, enhancing the model's robustness and generalization. We then extract a comprehensive set of handcrafted features, including Zero-Crossing Rate (ZCR), chromagram, spectral centroid, spectral roll-off, spectral contrast, spectral flatness, Mel-Frequency Cepstral Coefficients (MFCCs), Root Mean Square (RMS) energy, and Mel-spectrogram. Although these features are used as 1D numerical vectors, some of them are computed from time-frequency representations (e.g., chromagram, Mel-spectrogram) that can themselves be depicted as images, which is conceptually close to imaging-based analysis. These features capture key characteristics of the speech signal, providing valuable insights into the emotional content. Sequentially, we utilize a multi-stream deep learning architecture to automatically learn complex, hierarchical representations of the speech signal. This architecture consists of three distinct streams: the first stream uses 1D convolutional neural networks (1D CNNs), the second integrates 1D CNN with Long Short-Term Memory (LSTM), and the third combines 1D CNNs with bidirectional LSTM (Bi-LSTM). These models capture intricate emotional nuances that handcrafted features alone may not fully represent. For each of these models, we generate predicted scores and then employ ensemble learning with a soft voting technique to produce the final prediction. This fusion of handcrafted features, deep learning-derived features, and ensemble voting enhances the accuracy and robustness of emotion identification across multiple datasets. Our method demonstrates the effectiveness of combining various learning models to improve emotion recognition in Bangla speech, providing a more comprehensive solution compared with existing methods. We utilize three primary datasets-SUBESCO, BanglaSER, and a merged version of both-as well as two external datasets, RAVDESS and EMODB, to assess the performance of our models. Our method achieves impressive results with accuracies of 92.90%, 85.20%, 90.63%, 67.71%, and 69.25% for the SUBESCO, BanglaSER, merged SUBESCO and BanglaSER, RAVDESS, and EMODB datasets, respectively. These results demonstrate the effectiveness of combining handcrafted features with deep learning-based features through ensemble learning for robust emotion recognition in ","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genilton de França Barros Filho, José Fernando de Morais Firmino, Israel Solha, Ewerton Freitas de Medeiros, Alex Dos Santos Felix, José Carlos de Lima Júnior, Marcelo Dantas Tavares de Melo, Marcelo Cavalcanti Rodrigues
{"title":"Digital Image Processing and Convolutional Neural Network Applied to Detect Mitral Stenosis in Echocardiograms: Clinical Decision Support.","authors":"Genilton de França Barros Filho, José Fernando de Morais Firmino, Israel Solha, Ewerton Freitas de Medeiros, Alex Dos Santos Felix, José Carlos de Lima Júnior, Marcelo Dantas Tavares de Melo, Marcelo Cavalcanti Rodrigues","doi":"10.3390/jimaging11080272","DOIUrl":"10.3390/jimaging11080272","url":null,"abstract":"<p><p>The mitral valve is the most susceptible to pathological alterations, such as mitral stenosis, characterized by failure of the valve to open completely. In this context, the objective of this study was to apply digital image processing (DIP) and develop a convolutional neural network (CNN) to provide decision support for specialists in the diagnosis of mitral stenosis based on transesophageal echocardiography examinations. The following procedures were implemented: acquisition of echocardiogram exams; application of DIP; use of augmentation techniques; and development of a CNN. The DIP classified 26.7% cases without stenosis, 26.7% with mild stenosis, 13.3% with moderate stenosis, and 33.3% with severe stenosis. A CNN was initially developed to classify videos into those four categories. However, the number of acquired exams was insufficient to effectively train the model for this purpose. So, the final model was trained to differentiate between videos with or without stenosis, achieving an accuracy of 92% with a loss of 0.26. The results demonstrate that both DIP and CNN are effective in distinguishing between cases with and without stenosis. Moreover, DIP was capable of classifying varying degrees of stenosis severity-mild, moderate, and severe-highlighting its potential as a valuable tool in clinical decision support.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatmah Y Assiri, Mohammad D Alahmadi, Mohammed A Almuashi, Ayidh M Almansour
{"title":"Extract Nutritional Information from Bilingual Food Labels Using Large Language Models.","authors":"Fatmah Y Assiri, Mohammad D Alahmadi, Mohammed A Almuashi, Ayidh M Almansour","doi":"10.3390/jimaging11080271","DOIUrl":"10.3390/jimaging11080271","url":null,"abstract":"<p><p>Food product labels serve as a critical source of information, providing details about nutritional content, ingredients, and health implications. These labels enable Food and Drug Authorities (FDA) to ensure compliance and take necessary health-related and logistics actions. Additionally, product labels are essential for online grocery stores to offer reliable nutrition facts and empower customers to make informed dietary decisions. Unfortunately, product labels are typically available in image formats, requiring organizations and online stores to manually transcribe them-a process that is not only time-consuming but also highly prone to human error, especially with multilingual labels that add complexity to the task. Our study investigates the challenges and effectiveness of leveraging large language models (LLMs) to extract nutritional elements and values from multilingual food product labels, with a specific focus on Arabic and English. A comprehensive empirical analysis was conducted using a manually curated dataset of 294 food product labels, comprising 588 transcribed nutritional elements and values in both languages, which served as the ground truth for evaluation. The findings reveal that while LLMs performed better in extracting English elements and values compared to Arabic, our post-processing techniques significantly enhanced their accuracy, with GPT-4o outperforming GPT-4V and Gemini.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387780/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PU-DZMS: Point Cloud Upsampling via Dense Zoom Encoder and Multi-Scale Complementary Regression.","authors":"Shucong Li, Zhenyu Liu, Tianlei Wang, Zhiheng Zhou","doi":"10.3390/jimaging11080270","DOIUrl":"10.3390/jimaging11080270","url":null,"abstract":"<p><p>Point cloud imaging technology usually faces the problem of point cloud sparsity, which leads to a lack of important geometric detail. There are many point cloud upsampling networks that have been designed to solve this problem. However, the existing methods have limitations in local-global relation understanding, leading to contour distortion and many local sparse regions. To this end, PU-DZMS is proposed with two components. (1) the Dense Zoom Encoder (DENZE) is designed to capture local-global features by using ZOOM Blocks with a dense connection. The main module in the ZOOM Block is the Zoom Encoder, which embeds a Transformer mechanism into the down-upsampling process to enhance local-global geometric features. The geometric edge of the point cloud would be clear under the DENZE. (2) The Multi-Scale Complementary Regression (MSCR) module is designed to expand the features and regress a dense point cloud. MSCR obtains the features' geometric distribution differences across scales to ensure geometric continuity, and it regresses new points by adopting cross-scale residual learning. The local sparse regions of the point cloud would be reduced by the MSCR module. The experimental results on the PU-GAN dataset and the PU-Net dataset show that the proposed method performs well on point cloud upsampling tasks.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387154/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on the Accessibility of Different Colour Schemes for Web Resources for People with Colour Blindness.","authors":"Daiva Sajek, Olena Korotenko, Tetiana Kyrychok","doi":"10.3390/jimaging11080268","DOIUrl":"10.3390/jimaging11080268","url":null,"abstract":"<p><p>This study is devoted to the analysis of the perception of colour schemes of web resources by users with different types of colour blindness (colour vision deficiency). The purpose of this study is to develop recommendations for choosing the optimal colour scheme for web resource design that will ensure the comfortable perception of content for the broadest possible audience, including users with colour vision deficiency of various types (deuteranopia and deuteranomaly, protanopia and protanomaly, tritanopia, and tritanomaly). This article presents the results of a survey of people with different colour vision deficiencies regarding the accessibility of web resources created using different colour schemes. The colour deviation value ∆E was calculated to objectively assess changes in the perception of different colour groups by people with colour vision impairments. The conclusions of this study emphasise the importance of taking into account the needs of users with colour vision impairments when developing web resources. Specific recommendations for choosing the best colour schemes for websites are also offered, which will help increase the accessibility and effectiveness of web content for users with different types of colour blindness.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387176/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Placido Sub-Pixel Edge Detection Algorithm Based on Enhanced Mexican Hat Wavelet Transform and Improved Zernike Moments.","authors":"Yujie Wang, Jinyu Liang, Yating Xiao, Xinfeng Liu, Jiale Li, Guangyu Cui, Quan Zhang","doi":"10.3390/jimaging11080267","DOIUrl":"10.3390/jimaging11080267","url":null,"abstract":"<p><p>In order to meet the high-precision location requirements of the corneal Placido ring edge in corneal topographic reconstruction, this paper proposes a sub-pixel edge detection algorithm based on multi-scale and multi-position enhanced Mexican Hat Wavelet Transform and improved Zernike moment. Firstly, the image undergoes preliminary processing using a multi-scale and multi-position enhanced Mexican Hat Wavelet Transform function. Subsequently, the preliminary edge information extracted is relocated based on the Zernike moments of a 9 × 9 template. Finally, two improved adaptive edge threshold algorithms are employed to determine the actual sub-pixel edge points of the image, thereby realizing sub-pixel edge detection for corneal Placido ring images. Through comparison and analysis of edge extraction results from real human eye images obtained using the algorithm proposed in this paper and those from other existing algorithms, it is observed that the average sub-pixel edge error of other algorithms is 0.286 pixels, whereas the proposed algorithm achieves an average error of only 0.094 pixels. Furthermore, the proposed algorithm demonstrates strong robustness against noise.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387409/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cecilia Diana-Albelda, Álvaro García-Martín, Jesus Bescos
{"title":"A Review on Deep Learning Methods for Glioma Segmentation, Limitations, and Future Perspectives.","authors":"Cecilia Diana-Albelda, Álvaro García-Martín, Jesus Bescos","doi":"10.3390/jimaging11080269","DOIUrl":"10.3390/jimaging11080269","url":null,"abstract":"<p><p>Accurate and automated segmentation of gliomas from Magnetic Resonance Imaging (MRI) is crucial for effective diagnosis, treatment planning, and patient monitoring. However, the aggressive nature and morphological complexity of these tumors pose significant challenges that call for advanced segmentation techniques. This review provides a comprehensive analysis of Deep Learning (DL) methods for glioma segmentation, with a specific focus on bridging the gap between research performance and practical clinical deployment. We evaluate over 80 state-of-the-art models published up to 2025, categorizing them into CNN-based, Pure Transformer, and Hybrid CNN-Transformer architectures. The primary objective of this paper is to critically assess these models not only on their segmentation accuracy but also on their computational efficiency and suitability for real-world medical environments by incorporating hardware resource considerations. We present a comparison of model performance on the BraTS datasets benchmark and introduce a suitability analysis for top-performing models based on their robustness, efficiency, and completeness of tumor region delineation. By identifying current trends, limitations, and key trade-offs, this review offers future research directions aimed at optimizing the balance between technical performance and clinical usability to improve diagnostic outcomes for glioma patients.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387613/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing YOLOv5 for Autonomous Driving: Efficient Attention-Based Object Detection on Edge Devices.","authors":"Mortda A A Adam, Jules R Tapamo","doi":"10.3390/jimaging11080263","DOIUrl":"10.3390/jimaging11080263","url":null,"abstract":"<p><p>On-road vision-based systems rely on object detection to ensure vehicle safety and efficiency, making it an essential component of autonomous driving. Deep learning methods show high performance; however, they often require special hardware due to their large sizes and computational complexity, which makes real-time deployment on edge devices expensive. This study proposes lightweight object detection models based on the YOLOv5s architecture, known for its speed and accuracy. The models integrate advanced channel attention strategies, specifically the ECA module and SE attention blocks, to enhance feature selection while minimizing computational overhead. Four models were developed and trained on the KITTI dataset. The models were analyzed using key evaluation metrics to assess their effectiveness in real-time autonomous driving scenarios, including precision, recall, and mean average precision (mAP). BaseECAx2 emerged as the most efficient model for edge devices, achieving the lowest GFLOPs (13) and smallest model size (9.1 MB) without sacrificing performance. The BaseSE-ECA model demonstrated outstanding accuracy in vehicle detection, reaching a precision of 96.69% and an mAP of 98.4%, making it ideal for high-precision autonomous driving scenarios. We also assessed the models' robustness in more challenging environments by training and testing them on the BDD-100K dataset. While the models exhibited reduced performance in complex scenarios involving low-light conditions and motion blur, this evaluation highlights potential areas for improvement in challenging real-world driving conditions. This study bridges the gap between affordability and performance, presenting lightweight, cost-effective solutions for integration into real-time autonomous vehicle systems.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roman Ishchenko, Maksim Solopov, Andrey Popandopulo, Elizaveta Chechekhina, Viktor Turchin, Fedor Popivnenko, Aleksandr Ermak, Konstantyn Ladyk, Anton Konyashin, Kirill Golubitskiy, Aleksei Burtsev, Dmitry Filimonov
{"title":"Evaluation of Transfer Learning Efficacy for Surgical Suture Quality Classification on Limited Datasets.","authors":"Roman Ishchenko, Maksim Solopov, Andrey Popandopulo, Elizaveta Chechekhina, Viktor Turchin, Fedor Popivnenko, Aleksandr Ermak, Konstantyn Ladyk, Anton Konyashin, Kirill Golubitskiy, Aleksei Burtsev, Dmitry Filimonov","doi":"10.3390/jimaging11080266","DOIUrl":"10.3390/jimaging11080266","url":null,"abstract":"<p><p>This study evaluates the effectiveness of transfer learning with pre-trained convolutional neural networks (CNNs) for the automated binary classification of surgical suture quality (high-quality/low-quality) using photographs of three suture types: interrupted open vascular sutures (IOVS), continuous over-and-over open sutures (COOS), and interrupted laparoscopic sutures (ILS). To address the challenge of limited medical data, eight state-of-the-art CNN architectures-EfficientNetB0, ResNet50V2, MobileNetV3Large, VGG16, VGG19, InceptionV3, Xception, and DenseNet121-were trained and validated on small datasets (100-190 images per type) using 5-fold cross-validation. Performance was assessed using the F1-score, AUC-ROC, and a custom weighted stability-aware score (Score<sub>adj</sub>). The results demonstrate that transfer learning achieves robust classification (F1 > 0.90 for IOVS/ILS, 0.79 for COOS) despite data scarcity. ResNet50V2, DenseNet121, and Xception were more stable by Score<sub>adj</sub>, with ResNet50V2 achieving the highest AUC-ROC (0.959 ± 0.008) for IOVS internal view classification. GradCAM visualizations confirmed model focus on clinically relevant features (e.g., stitch uniformity, tissue apposition). These findings validate transfer learning as a powerful approach for developing objective, automated surgical skill assessment tools, reducing reliance on subjective expert evaluations while maintaining accuracy in resource-constrained settings.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144972682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}