IET Image Processing最新文献

筛选
英文 中文
Enhancing Fetal Plane Classification Accuracy With Data Augmentation Using Diffusion Models 利用扩散模型增强数据,提高胎儿平面分类精度
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-07 DOI: 10.1049/ipr2.70151
Yueying Tian, Elif Ucurum, Xudong Han, Rupert Young, Chris Chatwin, Philip Birch
{"title":"Enhancing Fetal Plane Classification Accuracy With Data Augmentation Using Diffusion Models","authors":"Yueying Tian,&nbsp;Elif Ucurum,&nbsp;Xudong Han,&nbsp;Rupert Young,&nbsp;Chris Chatwin,&nbsp;Philip Birch","doi":"10.1049/ipr2.70151","DOIUrl":"https://doi.org/10.1049/ipr2.70151","url":null,"abstract":"<p>Ultrasound imaging is widely used in medical diagnosis, especially for fetal health assessment. However, the availability of high-quality annotated ultrasound images is limited, which restricts the training of machine learning models. In this paper, we investigate the use of diffusion models to generate synthetic ultrasound images to improve the performance on fetal plane classification. We train different classifiers first on synthetic images and then fine-tune them with real images. Extensive experimental results demonstrate that incorporating generated images into training pipelines leads to better classification accuracy than training with real images alone. The findings suggest that generating synthetic data using diffusion models can be a valuable tool in overcoming the challenges of data scarcity in ultrasound medical imaging.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70151","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144574127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Systematic Review on Cell Nucleus Instance Segmentation 细胞核实例分割的系统综述
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-07 DOI: 10.1049/ipr2.70129
Yulin Chen, Qian Huang, Meng Geng, Zhijian Wang, Yi Han
{"title":"A Systematic Review on Cell Nucleus Instance Segmentation","authors":"Yulin Chen,&nbsp;Qian Huang,&nbsp;Meng Geng,&nbsp;Zhijian Wang,&nbsp;Yi Han","doi":"10.1049/ipr2.70129","DOIUrl":"https://doi.org/10.1049/ipr2.70129","url":null,"abstract":"<p>Cell nucleus instance segmentation plays a pivotal role in medical research and clinical diagnosis by providing insights into cell morphology, disease diagnosis, and treatment evaluation. Despite significant efforts from researchers in this field, there remains a lack of a comprehensive and systematic review that consolidates the latest advancements and challenges in this area. In this survey, we offer a thorough overview of existing approaches to nucleus instance segmentation, exploring both traditional and deep learning-based methods. Traditional methods include watershed, thresholding, active contour model, and clustering algorithms, while deep learning methods include one-stage methods and two-stage methods. For these methods, we examine their principles, procedural steps, strengths, and limitations, offering guidance on selecting appropriate techniques for different types of data. Furthermore, we comprehensively investigate the formidable challenges encountered in the field, including ethical implications, robustness under varying imaging conditions, computational constraints, and the scarcity of annotated data. Finally, we outline promising future directions for research, such as privacy-preserving and fair AI systems, domain generalization and adaptation, efficient and lightweight model design, learning from limited annotations, as well as advancing multimodal segmentation models.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70129","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144574128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Image Denoising: A Combination Method Using Multiscale Contextual Fusion and Recursive Learning 改进的图像去噪:一种基于多尺度上下文融合和递归学习的组合方法
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-30 DOI: 10.1049/ipr2.70143
Sonia Rehman, Muhammad Habib, Aftab Farrukh, Aarif Alutaybi
{"title":"Improved Image Denoising: A Combination Method Using Multiscale Contextual Fusion and Recursive Learning","authors":"Sonia Rehman,&nbsp;Muhammad Habib,&nbsp;Aftab Farrukh,&nbsp;Aarif Alutaybi","doi":"10.1049/ipr2.70143","DOIUrl":"https://doi.org/10.1049/ipr2.70143","url":null,"abstract":"<p>The exponential growth of imaging technology has led to a surge in visual content creation, necessitating advanced image denoising algorithms. Conventional methods, which frequently rely on predefined rules and filters, are inadequate for managing intricate noise patterns while maintaining image features. In order to tackle the issue of real-world image denoising, we investigate and integrate a new novel technique named recursive context fusion network (RCFNet) employing a deep convolutional neural network, demonstrating superior performance compared to current state-of-the-art approaches. RCFNet consists of a coarse feature extraction module and a reconstruction unit, where the former provides a broad contextual understanding and the latter refines the denoising output by preserving spatial and contextual details. Deep CNN learns features instead of using conventional methods, allowing us to improve and refine images. Dual attention units (DUs), in conjunction with the multi-scale resizing Block (MSRB) and selective kernel feature fusion (SKFF), are incorporated into the network to ensure efficient and reliable feature extraction. To demonstrate the advantages and challenges of combining many configurations into a single pipeline, we take a more detailed look at the results. By leveraging the complementary properties of these networks and computational models, we prefer to contribute to the creation of techniques that enhance image restoration while preserving crucial information, therefore encouraging further research and applications in image processing and artificial intelligence. The RCFNet achieves a high structural similarity index (SSIM) of 0.98 and a peak signal-to-noise ratio (PSNR) of 43.4 dB, outperforming many state-of-the-art methods on two benchmark datasets (DND and SIDD) and demonstrating its superior real-world image denoising ability.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70143","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Coarse-to-Fine Detection Framework for Automated Lung Tumour Detection From 3D PET/CT Images 从3D PET/CT图像自动检测肺部肿瘤的粗到细检测框架
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-30 DOI: 10.1049/ipr2.70146
Yunlong Zhao, Qiang Lin, Junfeng Mao, Jingjun Wei, Yongchun Cao, Zhengxing Man, Caihong Liu, Jingyan Ma, Xiaodi Huang
{"title":"A Coarse-to-Fine Detection Framework for Automated Lung Tumour Detection From 3D PET/CT Images","authors":"Yunlong Zhao,&nbsp;Qiang Lin,&nbsp;Junfeng Mao,&nbsp;Jingjun Wei,&nbsp;Yongchun Cao,&nbsp;Zhengxing Man,&nbsp;Caihong Liu,&nbsp;Jingyan Ma,&nbsp;Xiaodi Huang","doi":"10.1049/ipr2.70146","DOIUrl":"https://doi.org/10.1049/ipr2.70146","url":null,"abstract":"<p>Lung cancer remains the leading cause of cancer-related mortality worldwide. Early detection is critical to improving treatment outcomes and survival rates. Positron emission tomography/computed tomography (PET/CT) is a widely used imaging modality for identifying lung tumours. However, limitations in imaging resolution and the complexity of cancer characteristics make detecting small lesions particularly challenging. To address this issue, we propose a novel coarse-to-fine detection framework to reduce missed diagnoses of small lung lesions in PET/CT images. Our method integrates a stacked detection structure with a multi-attention guidance mechanism, effectively leveraging spatial and contextual information from small lesions to enhance lesion localisation. Experimental evaluations on a PET/CT dataset of 225 patients demonstrate the effectiveness of our method, achieving remarkable results with a <i>precision</i> of 81.74%, a <i>recall</i> of 76.64%, and an <i>mAP</i> of 84.72%. The proposed framework not only improves the detection accuracy of small target lesions in the lung but also provides a more reliable solution for early diagnosis.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70146","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CG-VTON: Controllable Generation of Virtual Try-On Images Based on Multimodal Conditions CG-VTON:基于多模态条件的虚拟试车图像可控生成
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-26 DOI: 10.1049/ipr2.70144
Haopeng Lei, Xuan Zhao, Yaqin Liang, Yuanlong Cao
{"title":"CG-VTON: Controllable Generation of Virtual Try-On Images Based on Multimodal Conditions","authors":"Haopeng Lei,&nbsp;Xuan Zhao,&nbsp;Yaqin Liang,&nbsp;Yuanlong Cao","doi":"10.1049/ipr2.70144","DOIUrl":"https://doi.org/10.1049/ipr2.70144","url":null,"abstract":"<p>Transforming fashion design sketches into realistic garments remains a challenging task due to the reliance on labor-intensive manual workflows that limit efficiency and scalability in traditional fashion pipelines. While recent advances in image generation and virtual try-on technologies have introduced partial automation, existing methods still lack controllability and struggle to maintain semantic consistency in garment pose and structure, restricting their applicability in real-world design scenarios. In this work, we present CG-VTON, a controllable virtual try-on framework designed to generate high-quality try-on images directly from clothing design sketches. The model integrates multi-modal conditional inputs, including dense human pose maps and textual garment descriptions, to guide the generation process. A novel pose constraint module is introduced to enhance garment-body alignment, while a structured diffusion-based pipeline performs progressive generation through latent denoising and global-context refinement. Extensive experiments conducted on benchmark datasets demonstrate that CG-VTON significantly outperforms existing state-of-the-art methods in terms of visual quality, pose consistency, and computational efficiency. By enabling high-fidelity and controllable try-on results from abstract sketches, CG-VTON offers a practical and robust solution for bridging the gap between conceptual design and realistic garment visualization.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70144","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCD-YOLOv10n: A Small Object Detection Algorithm for UAVs MCD-YOLOv10n:一种无人机小目标检测算法
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-26 DOI: 10.1049/ipr2.70145
Jinshuo Shi, Xitai Na, Shiji Hai, Qingbin Sun, Zhihui Feng, Xinyang Zhu
{"title":"MCD-YOLOv10n: A Small Object Detection Algorithm for UAVs","authors":"Jinshuo Shi,&nbsp;Xitai Na,&nbsp;Shiji Hai,&nbsp;Qingbin Sun,&nbsp;Zhihui Feng,&nbsp;Xinyang Zhu","doi":"10.1049/ipr2.70145","DOIUrl":"https://doi.org/10.1049/ipr2.70145","url":null,"abstract":"<p>Deep neural networks deployed on UAVs have made significant progress in data acquisition in recent years. However, traditional algorithms and deep learning models still face challenges in small and unevenly distributed object detection tasks. To address this problem, we propose the MCD-YOLOv10n model by introducing the MEMAttention module, which combines EMAttention with multiscale convolution, uses Softmax and AdaptiveAvgPool2d to adaptively compute feature weights, dynamically adjusts the region of interest, and captures cross-scale features. In addition, the C2f_MEMAttention and C2f_DSConv modules are formed by the fusion of C2f with MEMAttention and DSConv, which enhances the model's ability of extracting and adapting to irregular target features. Experiments on three datasets, VisDrone-DET2019, Exdark and DOTA-v1.5, show that the evaluation metric mAP50 achieves the best detection accuracy of 32.9%, 52.9% and 68.2% when the number of holdout parameters is at the minimum value of 2.24M. Moreover, the mAP50-95 metrics (19.5% for VisDrone-DET2019 and 45.0% for DOTA-v1.5) are 1.1 and 1.2 percentage points ahead of the second place, respectively. In terms of Recall, the VisDrone-DET2019 and DOTA-v1.5 datasets improved by 1.0% and 0.7% over the baseline model. These results validate that MCD-YOLOv10n has strong adaptability and generalization ability for small object detection in complex scenes.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Swimming Post Recognition Using Novel Method Based on Score Estimation 基于分数估计的游泳岗位识别新方法
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-25 DOI: 10.1049/ipr2.70140
Xie Lina, Xianfeng Huang, Luo Jie, Jian Zheng
{"title":"Swimming Post Recognition Using Novel Method Based on Score Estimation","authors":"Xie Lina,&nbsp;Xianfeng Huang,&nbsp;Luo Jie,&nbsp;Jian Zheng","doi":"10.1049/ipr2.70140","DOIUrl":"https://doi.org/10.1049/ipr2.70140","url":null,"abstract":"<p>Swimming sports are treated as modern competitive sports, and athletes need to standardize and correct their posture. Therefore, the recognition of swimming postures is considered as an important section the coaches implement training plans. Usually, the recognition of swimming postures is achieved through coach observation; however, this approach is inefficient and lacks sufficient accuracy. To address this issue, a novel recognition method is proposed. In the proposed method, different swimming postures are assigned a different score via using a two-stage scoring mechanism. The feature regions of swimming postures can be accurately identified. Following that, the assigned score is put into the Softmax layer of the proposed convolutional neural networks. Finally, 4000 images including six swimming postures are used as an experimental set. The experimental results show that the proposed method achieves 92.73% testing accuracy and 89.03% validation accuracy in the recognition of the six swimming postures, defeating against the opponents. Meanwhile, our method outperforms some competitors in terms of training efficiency. The proposed two-stage scoring mechanism can be used for image recognition in large-scale scenarios. Moreover, the two-stage scoring mechanism is independently of specific scenarios in process of assigning a score value for feature regions of images. Not only that, the two-stage scoring mechanism can replace complex network structures, so as to reduce the work of training parameters.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144482219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmented Multiple Perturbation Dual Mean Teacher Model for Semi-Supervised Intracranial Haemorrhage Segmentation 半监督颅内出血分割的增广多重摄动双均值教师模型
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-25 DOI: 10.1049/ipr2.70102
Yan Dong, Xiangjun Ji, Ting Wang, Chiyuan Ma, Zhenxing Li, Yanling Han, Kurosh Madani, Wenhui Wan
{"title":"Augmented Multiple Perturbation Dual Mean Teacher Model for Semi-Supervised Intracranial Haemorrhage Segmentation","authors":"Yan Dong,&nbsp;Xiangjun Ji,&nbsp;Ting Wang,&nbsp;Chiyuan Ma,&nbsp;Zhenxing Li,&nbsp;Yanling Han,&nbsp;Kurosh Madani,&nbsp;Wenhui Wan","doi":"10.1049/ipr2.70102","DOIUrl":"https://doi.org/10.1049/ipr2.70102","url":null,"abstract":"<p>Generally, there are two problems restrict the intracranial haemorrhage (ICH) segmentation task: scarcity of labelled data, and poor accuracy of ICH segmentation. To address these two issues, we propose a semi-supervised ICH segmentation model and a dedicated ICH segmentation backbone network. Our approach aims at leveraging semi-supervised modelling so as to alleviate the challenge of limited labelled data availability, while the dedicated ICH segmentation backbone network further enhances the segmentation precision. An augmented multiple perturbation dual mean teacher model is designed. Based on it, the prediction accuracy may be improved by a more stringent confidence-weighted cross-entropy (CW-CE) loss, and the feature perturbation may be increased using adversarial feature perturbation for the purpose of improving the generalization ability and efficiency of consistent learning. In the ICH segmentation backbone network, we promote the segmentation accuracy by extracting both local and global features of ICH and fusing them in depth. We also fuse the features with rich details from the upper encoder during the up-sampling process to reduce the loss of feature information. Experiments on our private dataset ICHDS, and the public dataset IN22SD demonstrate that our model outperforms current state-of-the-art ICH segmentation models, achieving a maximum improvement of over 10% in Dice and exhibiting the best overall performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70102","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144482250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HCTMIF: Hybrid CNN-Transformer Multi Information Fusion Network for Low Light Image Enhancement hctif:用于弱光图像增强的CNN-Transformer混合多信息融合网络
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-24 DOI: 10.1049/ipr2.70127
Han Wang, Hengshuai Cui, Jinjiang Li, Zhen Hua
{"title":"HCTMIF: Hybrid CNN-Transformer Multi Information Fusion Network for Low Light Image Enhancement","authors":"Han Wang,&nbsp;Hengshuai Cui,&nbsp;Jinjiang Li,&nbsp;Zhen Hua","doi":"10.1049/ipr2.70127","DOIUrl":"https://doi.org/10.1049/ipr2.70127","url":null,"abstract":"<p>Images captured with poor hardware and insufficient light sources suffer from visual degradation such as low visibility, strong noise, and color casts. Low-light image enhancement methods focus on solving the problem of brightness in dark areas while eliminating the degradation of low-light images. To solve the above problems, we proposed a hybrid CNN-transformer multi information fusion network (HCTMIF) for low-light image enhancement. In this paper, the proposed network architecture is divided into three stages to progressively improve the degraded features of low-light images using the divide-and-conquer principle. First, both the first stage and the second stage adopt the encoder–decoder architecture composed of transformer and CNN to improve the long-distance modeling and local feature extraction capabilities of the network. We add a visual enhancement module (VEM) to the encoding block to further strengthen the network's ability to learn global and local information. In addition, the multi-information fusion block (MIFB) is used to complement the feature maps corresponding to the same scale of the coding block and decoding block of each layer. Second, to improve the mobility of useful information across stages, we designed the self-supervised module (SSM) to readjust the weight parameters to enhance the characterization of local features. Finally, to retain the spatial details of the enhanced images more precisely, we design the detail supplement unit (DSU) to enrich the saturation of the enhanced images. After qualitative and quantitative analyses on multiple benchmark datasets, our method outperforms other methods in terms of visual effects and metric scores.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70127","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Two-Stage Homography Matrix Prediction Approach for Trajectory Generation in Multi-Object Tracking on Sports Fields 运动场上多目标跟踪轨迹生成的两阶段单应矩阵预测方法
IF 2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-24 DOI: 10.1049/ipr2.70136
Pan Zhang, Jiangtao Luo, Guoliang Xu, Xupeng Liang
{"title":"A Two-Stage Homography Matrix Prediction Approach for Trajectory Generation in Multi-Object Tracking on Sports Fields","authors":"Pan Zhang,&nbsp;Jiangtao Luo,&nbsp;Guoliang Xu,&nbsp;Xupeng Liang","doi":"10.1049/ipr2.70136","DOIUrl":"https://doi.org/10.1049/ipr2.70136","url":null,"abstract":"<p>Homography estimation is a fundamental topic in computer vision, especially in scenarios that require perspective changes for intelligent analysis of sports fields, where it plays a crucial role. Existing methods predict the homography matrix either indirectly by evaluating the 4-key-point coordinate deviation in paired images with the same visual content or directly by fine-tuning the 8 degrees of freedom numerical values that define the matrix. However, these approaches often fail to effectively incorporate coordinate positional information and overlook optimal application scenarios, leading to significant accuracy bottlenecks, particularly for paired images with differing visual content. To address these issues, we propose an approach that integrates both methods in a staged manner, leveraging their respective advantages. In the first stage, positional information is embedded to enhance convolutional computations, replacing serial concatenation in traditional feature fusion with parallel concatenation, while using 4-key-point coordinate deviation to predict the macroscopic homography matrix. In the second stage, positional information is further integrated into the input images to refine the direct 8 degrees of freedom numerical predictions, improving matrix fine-tuning accuracy. Comparative experiments with state-of-the-art methods demonstrate that our approach achieves superior performance, yielding a root mean square error as low as 1.25 and an average corner errror as low as 14.1 in homography transformation of competitive sports image pairs.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70136","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信