Journal of Imaging最新文献

筛选
英文 中文
Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning. 基于双层注意多实例学习的放射前关键影像特征选择多模态融合预测放射性肺炎。
IF 2.7
Journal of Imaging Pub Date : 2026-04-08 DOI: 10.3390/jimaging12040158
Hao Wang, Dinghui Wu, Shuguang Han, Jingli Tang, Wenlong Zhang
{"title":"Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning.","authors":"Hao Wang, Dinghui Wu, Shuguang Han, Jingli Tang, Wenlong Zhang","doi":"10.3390/jimaging12040158","DOIUrl":"https://doi.org/10.3390/jimaging12040158","url":null,"abstract":"<p><p>Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations in pre-radiotherapy CT images. To address these challenges, we propose a multimodal fusion framework based on Dual-Layer Attention-Based Adaptive Bag Embedding Multiple-Instance Learning (DAAE-MIL) for accurate RP prediction. This study retrospectively collected data from 995 LA-NSCLC patients who received thoracic radiotherapy between November 2018 and April 2025. After screening, Subject datasets (<i>n</i> = 670) were allocated for training (<i>n</i> = 535), and the remaining samples (<i>n</i> = 135) were reserved for an independent test set. The proposed framework first extracts pre-radiotherapy CT image features using a fine-tuned C3D network, followed by the DAAE-MIL module to screen critical instances and generate bag-level representations, thereby enhancing the accuracy of deep feature extraction. Subsequently, clinical data, radiomics features, and CT-derived deep features are integrated to construct a multimodal prediction model. The proposed model demonstrates promising RP prediction performance across multiple evaluation metrics, outperforming both state-of-the-art and unimodal RP prediction approaches. On the test set, it achieves an accuracy (ACC) of 0.93 and an area under the curve (AUC) of 0.97. This study validates that the proposed method effectively addresses the limitations of single-modal prediction and the unknown key features in pre-radiotherapy CT images while providing significant clinical value for RP risk assessment.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13117034/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147783963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experimental Analysis of the Effects of Image Lightness and Chroma Modulation on the Reproduction of Glossiness, Transparency and Roughness. 图像亮度和色度调制对光泽度、透明度和粗糙度再现影响的实验分析。
IF 2.7
Journal of Imaging Pub Date : 2026-04-08 DOI: 10.3390/jimaging12040159
Hideyuki Ajiki, Midori Tanaka
{"title":"Experimental Analysis of the Effects of Image Lightness and Chroma Modulation on the Reproduction of Glossiness, Transparency and Roughness.","authors":"Hideyuki Ajiki, Midori Tanaka","doi":"10.3390/jimaging12040159","DOIUrl":"https://doi.org/10.3390/jimaging12040159","url":null,"abstract":"<p><p>Even when an object's color is accurately reproduced in a colorimetrically reproduced image (CRI), the perceived material appearance does not necessarily match that of the original object. This mismatch remains a challenge for faithfully reproducing real-world appearance in digital media. In this study, we investigated how lightness and chroma modulation affect the perception of glossiness, transparency, and roughness. These three attributes were quantitatively correlated with physical surface properties and image features through a direct comparison between objects and images. Observers selected the images that best matched the material appearance of the physical samples for each attribute. Image features derived from the gray-level co-occurrence matrix (GLCM) and surface roughness parameters were analyzed to compare the selected images with the CRI. In the lightness experiment, observers consistently selected images with higher lightness than the CRI, which was accompanied by increased complexity in the luminance distribution. In the chroma experiment, images with higher chroma were preferred; however, changes in GLCM features were negligible. Notably, stimuli with small local luminance differences at the CRI required larger shifts in image features to achieve perceptual matching. These findings indicate that modulating the luminance distribution is crucial for aligning the perceived appearance between physical objects and their digital representations.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13118215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147784041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Method for Human Pose Estimation and Joint Angle Computation Through Deep Learning. 基于深度学习的人体姿态估计和关节角度计算方法。
IF 2.7
Journal of Imaging Pub Date : 2026-04-06 DOI: 10.3390/jimaging12040157
Ludovica Ciardiello, Patrizia Agnello, Marta Petyx, Fabio Martinelli, Mario Cesarelli, Antonella Santone, Francesco Mercaldo
{"title":"A Method for Human Pose Estimation and Joint Angle Computation Through Deep Learning.","authors":"Ludovica Ciardiello, Patrizia Agnello, Marta Petyx, Fabio Martinelli, Mario Cesarelli, Antonella Santone, Francesco Mercaldo","doi":"10.3390/jimaging12040157","DOIUrl":"https://doi.org/10.3390/jimaging12040157","url":null,"abstract":"<p><p>Human pose estimation is a crucial task in computer vision with widespread applications in healthcare, rehabilitation, sports, and remote monitoring. In this paper, we propose a deep learning-based method for automatic human pose estimation and joint angle computation, tailored specifically for physiotherapy and telemedicine scenarios. Beyond pose estimation, the proposed method is able to compute angles between joints, enabling analysis of body alignment and posture. The proposed approach is built upon a customized skeleton with 25 anatomical keypoints and a dataset composed of over 150,000 annotated and augmented images derived from multiple open-source datasets. Experimental results demonstrate the effectiveness of the proposed method, achieving a mAP@50 of 0.58 for keypoint localization and 0.98 for object detection. Moreover, we demonstrate several real-world practical use cases in evaluating exercise correctness and identifying postural deviations by exploiting the proposed method, confirming that the proposed method can represent a promising approach for automated motion analysis, with potential impact on digital health, rehabilitation support, and remote patient care.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13118140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147783852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Effective Non-Rigid Registration Approach for Ultrasound Images Based on the Improved Variational Model of Intensity, Local Phase Information and Descriptor Matching. 基于强度、局部相位信息和描述子匹配的改进变分模型的超声图像非刚性配准方法。
IF 2.7
Journal of Imaging Pub Date : 2026-04-03 DOI: 10.3390/jimaging12040156
Kun Zhang, Jinming Xing, Qingtai Xiao
{"title":"An Effective Non-Rigid Registration Approach for Ultrasound Images Based on the Improved Variational Model of Intensity, Local Phase Information and Descriptor Matching.","authors":"Kun Zhang, Jinming Xing, Qingtai Xiao","doi":"10.3390/jimaging12040156","DOIUrl":"https://doi.org/10.3390/jimaging12040156","url":null,"abstract":"<p><p>Ultrasound images have some limitations, such as low signal-to-noise ratio (SNR), speckle noise, lower dynamic range, blurred boundaries, and shadowing; therefore, ultrasound image registration is an important task for estimating tissue motion and analyzing tissue mechanical properties. In this paper, an effective non-rigid ultrasound image registration method is proposed. By integrating intensity, local phase information, and descriptor matching under a variational framework, we can find and track the non-rigid transformation of each pixel under diffeomorphism between the source and target images based on the warping technique. Experiments using simulation and in vivo ultrasound images of the human carotid artery are conducted to demonstrate the advantages of the proposed algorithm, which will act as an important supplement to current ultrasound image registration.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13117910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147783962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DA-CycleGAN: Degradation-Adaptive Unpaired Super-Resolution for Historical Image Restoration. DA-CycleGAN:退化自适应非配对超分辨率历史图像恢复。
IF 2.7
Journal of Imaging Pub Date : 2026-04-03 DOI: 10.3390/jimaging12040155
Lujun Zhai, Yonghui Wang, Yu Zhou, Suxia Cui
{"title":"DA-CycleGAN: Degradation-Adaptive Unpaired Super-Resolution for Historical Image Restoration.","authors":"Lujun Zhai, Yonghui Wang, Yu Zhou, Suxia Cui","doi":"10.3390/jimaging12040155","DOIUrl":"https://doi.org/10.3390/jimaging12040155","url":null,"abstract":"<p><p>Historical images as the dominant method for documenting the world and its inhabitants can help us to better understand the real history. Due to the limited camera technology, historical images captured in the early to mid-20th century tend to be very blurry, unclear, noisy, and obscure. The goal of this paper is to super-resolve images for historical image restoration. Compared to the degradations in modern digital imagery, those in historical images have unique features that are typically much more complex and less well understood. The discrepancy between historical images and modern high-definition digital images leads to a significant performance drop for existing super-resolution (SR) models trained on modern digital imagery. To tackle this problem, we propose a new method, namely DA-CycleGAN. Specifically, the DA-CycleGAN is built on top of CycleGAN to achieve unsupervised learning. We introduce a degradation-adaptive (DA) module with strong, flexible adaptation to learn various unknown degradations from samples. Moreover, we collect a large dataset containing 10,000 low-resolution images from real historical films. The dataset features various natural degradations. Our experimental results demonstrate the superior performance of DA-CycleGAN and the effectiveness of our image dataset for achieving accurate super-resolution enhancement of historical images.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13117639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147784051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WeatherMAR: Complementary Masking of Paired Tokens for Adverse-Weather Image Restoration. WeatherMAR:用于恶劣天气图像恢复的成对令牌的互补掩蔽。
IF 2.7
Journal of Imaging Pub Date : 2026-04-02 DOI: 10.3390/jimaging12040154
Junyuan Ma, Qunbo Lv, Zheng Tan
{"title":"WeatherMAR: Complementary Masking of Paired Tokens for Adverse-Weather Image Restoration.","authors":"Junyuan Ma, Qunbo Lv, Zheng Tan","doi":"10.3390/jimaging12040154","DOIUrl":"https://doi.org/10.3390/jimaging12040154","url":null,"abstract":"<p><p>Image restoration under adverse weather conditions has attracted increasing attention because of its importance for both human perception and downstream vision applications. Existing methods, however, are often designed for a single degradation type. We present <b>WeatherMAR</b>, a multi-weather restoration framework that formulates adverse-weather restoration as a paired-domain completion problem in a shared continuous token space. Specifically, WeatherMAR concatenates degraded and clean token sequences into a joint paired-domain sequence and performs restoration through masked autoregressive modeling, in which self-attention enables direct cross-domain interaction. To strengthen conditional learning while avoiding trivial paired correspondences, we introduce complementary bidirectional masking together with an optional reverse objective used only during training to encourage degradation-aware representations. WeatherMAR further employs a conditional diffusion objective for continuous token prediction and adopts a progress-to-step schedule to improve inference efficiency. Extensive experiments on standard multi-weather benchmarks, including Snow100K, Outdoor-Rain, and RainDrop, show that WeatherMAR achieves the best PSNR/SSIM on Snow100K-S (38.14/0.9684), the best SSIM on Outdoor-Rain (0.9396), and the best PSNR on Snow100K-L (32.58) and RainDrop (33.12). These results demonstrate that paired-domain token completion provides an effective solution for adverse-weather restoration.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13118218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147783990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radon-Guided Wavelet-Domain Attention U-Net for Periodic Artifact Suppression in Brain MRI. 氡引导小波域注意U-Net用于脑MRI周期性伪影抑制。
IF 2.7
Journal of Imaging Pub Date : 2026-04-02 DOI: 10.3390/jimaging12040153
Jesus David Rios-Perez, German Sanchez-Torres, John W Branch-Bedoya, Camilo Andres Laiton-Bonadiez
{"title":"Radon-Guided Wavelet-Domain Attention U-Net for Periodic Artifact Suppression in Brain MRI.","authors":"Jesus David Rios-Perez, German Sanchez-Torres, John W Branch-Bedoya, Camilo Andres Laiton-Bonadiez","doi":"10.3390/jimaging12040153","DOIUrl":"https://doi.org/10.3390/jimaging12040153","url":null,"abstract":"<p><p>Periodic artifacts such as ringing (Gibbs), herringbone (spike/corduroy), and zipper patterns degrade the quality of brain MRI. We present a reproducible framework that (i) synthetically generates periodic artifacts with controllable severity directly in k-space, (ii) normalizes pattern orientation through a Radon-guided alignment step, and (iii) corrects them in the wavelet domain using a 2D DWT (AA/AD/DA/DD) with a band-weighted loss. The evaluation was conducted using DLBS T1-weighted 3T MRI volumes with synthetically generated periodic artifacts. It combined global image-quality metrics (SSIM, PSNR) with per-band metrics to quantify how correction concentrates on high-frequency components, and included ablation studies, mixed-artifact stress tests, and structural preservation analyses. Compared with several baseline architectures, the proposed approach shows improvements in structural fidelity and a reduction in periodic patterns (SSIM: 0.985±0.022; PSNR: 43.337±5.364; reduction in concentrated error in high-frequency bands), while preserving unaffected structures. These findings indicate that, within a controlled synthetic benchmark, aligning the pattern orientation prior to learning and optimizing correction in the wavelet domain enables suppression of synthetically generated periodic artifacts while limiting over-smoothing.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13117483/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147783200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Automated Computational Identification of Fibrosis for Enhanced Histopathological Decision Support. 用于增强组织病理学决策支持的纤维化半自动计算识别。
IF 2.7
Journal of Imaging Pub Date : 2026-03-31 DOI: 10.3390/jimaging12040152
Alexandru-George Berciu, Diana Rus-Gonciar, Teodora Mocan, Lucia Agoston-Coldea, Carmen Cionca, Eva-Henrietta Dulf
{"title":"Semi-Automated Computational Identification of Fibrosis for Enhanced Histopathological Decision Support.","authors":"Alexandru-George Berciu, Diana Rus-Gonciar, Teodora Mocan, Lucia Agoston-Coldea, Carmen Cionca, Eva-Henrietta Dulf","doi":"10.3390/jimaging12040152","DOIUrl":"https://doi.org/10.3390/jimaging12040152","url":null,"abstract":"<p><p>Myocardial fibrosis is a critical prognostic marker involving a progressive cascade of pathological conditions. Accurate assessment of fibrosis in myocardial samples is a routine but difficult procedure for pathologists. This article presents a semi-automated system designed to ease this task while providing pixel-level accuracy that exceeds manual estimation capabilities. The proposed innovative approach combines Gabor filters with CIELAB color space analysis to ensure the efficiency and interpretability of calculations. Testing on histopathological samples, differentiating between fibrous, healthy, and variant tissues, yielded a promising accuracy of 87.5% for images with fibrosis and 80% for all 45 images tested. This system successfully establishes a solid foundation for automated diagnosis, providing pathologists with a reliable and highly accurate tool for quantitative analysis of cardiac tissue.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13117094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147783752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radiomic Characterization of Adrenal Incidentalomas on NECT: Retrospective Exploratory Study and Systematic Review. 肾上腺偶发瘤的放射学特征:回顾性探索性研究和系统评价。
IF 2.7
Journal of Imaging Pub Date : 2026-03-30 DOI: 10.3390/jimaging12040151
Pasquale Frisina, Paolo Ricci, Filippo Valentini, Daniela Messineo
{"title":"Radiomic Characterization of Adrenal Incidentalomas on NECT: Retrospective Exploratory Study and Systematic Review.","authors":"Pasquale Frisina, Paolo Ricci, Filippo Valentini, Daniela Messineo","doi":"10.3390/jimaging12040151","DOIUrl":"https://doi.org/10.3390/jimaging12040151","url":null,"abstract":"<p><p>Radiomics may aid the noninvasive characterization of adrenal incidentalomas; however, reproducibility is limited by methodological heterogeneity. In this retrospective, single-center, exploratory study, we tested whether radiomic features from baseline non-enhanced computed tomography (NECT) discriminate benign from malignant/metastatic adrenal lesions and contextualized results with a PRISMA 2020 systematic review (PubMed/Scopus 2017-2025; PROSPERO CRD420251276627). Thirty-three patients (36 lesions: 12 lipid-rich adenomas, 9 lipid-poor adenomas, 6 pheochromocytomas, 7 malignant/metastatic lesions, 2 myelolipomas) were included; myelolipomas were excluded from primary comparisons. Two abdominal radiologists performed consensus 3D segmentation on NECT. Using LIFEx (v7.8.0) and IBSI definitions, 42 features were extracted and z-score standardized. LASSO selected four heterogeneity descriptors: First-order Entropy, gray-level co-occurrence matrix (GLCM) entropy, gray-level size zone matrix (GLSZM) non-uniformity, and neighboring gray tone difference matrix (NGTDM) busyness. Heterogeneity increased from lipid-rich adenomas to pheochromocytomas and malignant/metastatic lesions (Kruskal-Wallis, all <i>p</i> < 0.001. Pairwise separability, measured using the Vargha-Delaney A index (VDA) as a rank-based measure of separability, was highest for lipid-rich adenomas versus malignant/metastatic lesions (0.93), intermediate for lipid-poor adenomas versus pheochromocytomas (0.73), and lowest for lipid-rich versus lipid-poor adenomas (0.64). The review identified 18 eligible CT radiomics studies that consistently reported higher entropy/non-uniformity in pheochromocytomas and malignant lesions than in lipid-rich adenomas. Global heterogeneity metrics on NECT may complement conventional CT criteria in indeterminate lesions; external validation with robust reference standards is needed in larger, multicenter cohorts with harmonization.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13118195/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147782878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT). 基于时空的幼儿园学生鲁棒跟踪与再识别:基于YOLOv8n和视觉转换器(ViT)的混合深度学习框架。
IF 2.7
Journal of Imaging Pub Date : 2026-03-30 DOI: 10.3390/jimaging12040150
Md Rahatul Islam, Yui Kataoka, Keisuke Teramoto, Keiichi Horio
{"title":"Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT).","authors":"Md Rahatul Islam, Yui Kataoka, Keisuke Teramoto, Keiichi Horio","doi":"10.3390/jimaging12040150","DOIUrl":"https://doi.org/10.3390/jimaging12040150","url":null,"abstract":"<p><p>Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address this challenge, this study proposes a novel hybrid framework combining YOLOv8 and Vision Transformer (ViT). Using YOLOv8 for detection and ViT for global feature extraction, we trained the model on a custom dataset of 31,521 images, achieving an overall accuracy of 93.75%, and the public benchmark MOT20 dataset of 28,630 images, achieving an overall accuracy of 96.02%. Our system showed remarkable success in tracking performance, where it achieved 86.7% MOTA and 99.7% IDF1 scores. This high IDF1 score proves that the model is highly effective in preventing identity switch. The main novelty of this study is the behavioral analysis of children beyond the boundaries of surveillance, where we measure walking distance and trajectory, and screen time. Finally, through cross-dataset comparison with the MOT20 public benchmark, we demonstrated that our proposed customized model is much more effective than current state-of-the-art methods in overcoming the domain gap in specific environments such as kindergarten.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13117115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147783669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书