{"title":"A Coarse-to-Fine Detection Framework for Automated Lung Tumour Detection From 3D PET/CT Images","authors":"Yunlong Zhao, Qiang Lin, Junfeng Mao, Jingjun Wei, Yongchun Cao, Zhengxing Man, Caihong Liu, Jingyan Ma, Xiaodi Huang","doi":"10.1049/ipr2.70146","DOIUrl":"10.1049/ipr2.70146","url":null,"abstract":"<p>Lung cancer remains the leading cause of cancer-related mortality worldwide. Early detection is critical to improving treatment outcomes and survival rates. Positron emission tomography/computed tomography (PET/CT) is a widely used imaging modality for identifying lung tumours. However, limitations in imaging resolution and the complexity of cancer characteristics make detecting small lesions particularly challenging. To address this issue, we propose a novel coarse-to-fine detection framework to reduce missed diagnoses of small lung lesions in PET/CT images. Our method integrates a stacked detection structure with a multi-attention guidance mechanism, effectively leveraging spatial and contextual information from small lesions to enhance lesion localisation. Experimental evaluations on a PET/CT dataset of 225 patients demonstrate the effectiveness of our method, achieving remarkable results with a <i>precision</i> of 81.74%, a <i>recall</i> of 76.64%, and an <i>mAP</i> of 84.72%. The proposed framework not only improves the detection accuracy of small target lesions in the lung but also provides a more reliable solution for early diagnosis.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70146","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CG-VTON: Controllable Generation of Virtual Try-On Images Based on Multimodal Conditions","authors":"Haopeng Lei, Xuan Zhao, Yaqin Liang, Yuanlong Cao","doi":"10.1049/ipr2.70144","DOIUrl":"10.1049/ipr2.70144","url":null,"abstract":"<p>Transforming fashion design sketches into realistic garments remains a challenging task due to the reliance on labor-intensive manual workflows that limit efficiency and scalability in traditional fashion pipelines. While recent advances in image generation and virtual try-on technologies have introduced partial automation, existing methods still lack controllability and struggle to maintain semantic consistency in garment pose and structure, restricting their applicability in real-world design scenarios. In this work, we present CG-VTON, a controllable virtual try-on framework designed to generate high-quality try-on images directly from clothing design sketches. The model integrates multi-modal conditional inputs, including dense human pose maps and textual garment descriptions, to guide the generation process. A novel pose constraint module is introduced to enhance garment-body alignment, while a structured diffusion-based pipeline performs progressive generation through latent denoising and global-context refinement. Extensive experiments conducted on benchmark datasets demonstrate that CG-VTON significantly outperforms existing state-of-the-art methods in terms of visual quality, pose consistency, and computational efficiency. By enabling high-fidelity and controllable try-on results from abstract sketches, CG-VTON offers a practical and robust solution for bridging the gap between conceptual design and realistic garment visualization.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70144","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MCD-YOLOv10n: A Small Object Detection Algorithm for UAVs","authors":"Jinshuo Shi, Xitai Na, Shiji Hai, Qingbin Sun, Zhihui Feng, Xinyang Zhu","doi":"10.1049/ipr2.70145","DOIUrl":"10.1049/ipr2.70145","url":null,"abstract":"<p>Deep neural networks deployed on UAVs have made significant progress in data acquisition in recent years. However, traditional algorithms and deep learning models still face challenges in small and unevenly distributed object detection tasks. To address this problem, we propose the MCD-YOLOv10n model by introducing the MEMAttention module, which combines EMAttention with multiscale convolution, uses Softmax and AdaptiveAvgPool2d to adaptively compute feature weights, dynamically adjusts the region of interest, and captures cross-scale features. In addition, the C2f_MEMAttention and C2f_DSConv modules are formed by the fusion of C2f with MEMAttention and DSConv, which enhances the model's ability of extracting and adapting to irregular target features. Experiments on three datasets, VisDrone-DET2019, Exdark and DOTA-v1.5, show that the evaluation metric mAP50 achieves the best detection accuracy of 32.9%, 52.9% and 68.2% when the number of holdout parameters is at the minimum value of 2.24M. Moreover, the mAP50-95 metrics (19.5% for VisDrone-DET2019 and 45.0% for DOTA-v1.5) are 1.1 and 1.2 percentage points ahead of the second place, respectively. In terms of Recall, the VisDrone-DET2019 and DOTA-v1.5 datasets improved by 1.0% and 0.7% over the baseline model. These results validate that MCD-YOLOv10n has strong adaptability and generalization ability for small object detection in complex scenes.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Swimming Post Recognition Using Novel Method Based on Score Estimation","authors":"Xie Lina, Xianfeng Huang, Luo Jie, Jian Zheng","doi":"10.1049/ipr2.70140","DOIUrl":"10.1049/ipr2.70140","url":null,"abstract":"<p>Swimming sports are treated as modern competitive sports, and athletes need to standardize and correct their posture. Therefore, the recognition of swimming postures is considered as an important section the coaches implement training plans. Usually, the recognition of swimming postures is achieved through coach observation; however, this approach is inefficient and lacks sufficient accuracy. To address this issue, a novel recognition method is proposed. In the proposed method, different swimming postures are assigned a different score via using a two-stage scoring mechanism. The feature regions of swimming postures can be accurately identified. Following that, the assigned score is put into the Softmax layer of the proposed convolutional neural networks. Finally, 4000 images including six swimming postures are used as an experimental set. The experimental results show that the proposed method achieves 92.73% testing accuracy and 89.03% validation accuracy in the recognition of the six swimming postures, defeating against the opponents. Meanwhile, our method outperforms some competitors in terms of training efficiency. The proposed two-stage scoring mechanism can be used for image recognition in large-scale scenarios. Moreover, the two-stage scoring mechanism is independently of specific scenarios in process of assigning a score value for feature regions of images. Not only that, the two-stage scoring mechanism can replace complex network structures, so as to reduce the work of training parameters.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144482219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Dong, Xiangjun Ji, Ting Wang, Chiyuan Ma, Zhenxing Li, Yanling Han, Kurosh Madani, Wenhui Wan
{"title":"Augmented Multiple Perturbation Dual Mean Teacher Model for Semi-Supervised Intracranial Haemorrhage Segmentation","authors":"Yan Dong, Xiangjun Ji, Ting Wang, Chiyuan Ma, Zhenxing Li, Yanling Han, Kurosh Madani, Wenhui Wan","doi":"10.1049/ipr2.70102","DOIUrl":"10.1049/ipr2.70102","url":null,"abstract":"<p>Generally, there are two problems restrict the intracranial haemorrhage (ICH) segmentation task: scarcity of labelled data, and poor accuracy of ICH segmentation. To address these two issues, we propose a semi-supervised ICH segmentation model and a dedicated ICH segmentation backbone network. Our approach aims at leveraging semi-supervised modelling so as to alleviate the challenge of limited labelled data availability, while the dedicated ICH segmentation backbone network further enhances the segmentation precision. An augmented multiple perturbation dual mean teacher model is designed. Based on it, the prediction accuracy may be improved by a more stringent confidence-weighted cross-entropy (CW-CE) loss, and the feature perturbation may be increased using adversarial feature perturbation for the purpose of improving the generalization ability and efficiency of consistent learning. In the ICH segmentation backbone network, we promote the segmentation accuracy by extracting both local and global features of ICH and fusing them in depth. We also fuse the features with rich details from the upper encoder during the up-sampling process to reduce the loss of feature information. Experiments on our private dataset ICHDS, and the public dataset IN22SD demonstrate that our model outperforms current state-of-the-art ICH segmentation models, achieving a maximum improvement of over 10% in Dice and exhibiting the best overall performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70102","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144482250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HCTMIF: Hybrid CNN-Transformer Multi Information Fusion Network for Low Light Image Enhancement","authors":"Han Wang, Hengshuai Cui, Jinjiang Li, Zhen Hua","doi":"10.1049/ipr2.70127","DOIUrl":"10.1049/ipr2.70127","url":null,"abstract":"<p>Images captured with poor hardware and insufficient light sources suffer from visual degradation such as low visibility, strong noise, and color casts. Low-light image enhancement methods focus on solving the problem of brightness in dark areas while eliminating the degradation of low-light images. To solve the above problems, we proposed a hybrid CNN-transformer multi information fusion network (HCTMIF) for low-light image enhancement. In this paper, the proposed network architecture is divided into three stages to progressively improve the degraded features of low-light images using the divide-and-conquer principle. First, both the first stage and the second stage adopt the encoder–decoder architecture composed of transformer and CNN to improve the long-distance modeling and local feature extraction capabilities of the network. We add a visual enhancement module (VEM) to the encoding block to further strengthen the network's ability to learn global and local information. In addition, the multi-information fusion block (MIFB) is used to complement the feature maps corresponding to the same scale of the coding block and decoding block of each layer. Second, to improve the mobility of useful information across stages, we designed the self-supervised module (SSM) to readjust the weight parameters to enhance the characterization of local features. Finally, to retain the spatial details of the enhanced images more precisely, we design the detail supplement unit (DSU) to enrich the saturation of the enhanced images. After qualitative and quantitative analyses on multiple benchmark datasets, our method outperforms other methods in terms of visual effects and metric scores.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70127","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adhered Buckwheat Seed Segmentation Method Based on Improved UNet","authors":"Shaozhong Lv, Shuaiming Guan, Zhengbing Xiong","doi":"10.1049/ipr2.70142","DOIUrl":"10.1049/ipr2.70142","url":null,"abstract":"<p>To address the issue of adhesion segmentation caused by the small volume, diverse morphology, large quantity, and fuzzy adherence boundaries of seeds in the output image of the buckwheat hulling machine, this paper proposes a semantic segmentation model, ResCo-UNet. The model integrates the ResNet18 network structure in the encoder of UNet, enhancing feature extraction capabilities and accelerating network training speed. To improve the recognition of adhered target boundaries, a novel parallel attention mechanism, CA<sup>2</sup>, is designed and incorporated into ResNet18, thereby enhancing the extraction of high-level semantic information. In the decoder stage, ConvNeXt modules are introduced to expand the receptive field, enhancing the model's ability to reconstruct complex boundary features. Results demonstrate that ResCo-UNet exhibits stronger generalization capabilities compared to other models, showing significant enhancements across multiple metrics: 87.81% mIoU, 92.71% recall, and 93.33% <i>F</i>1-score. Compared to the baseline model, these metrics increased by 6.71%, 5.13%, and 4.32%, respectively. Analysis of detection results across images with different distribution densities revealed an average counting accuracy of 98.89% for adherent seeds. The model effectively segments adhered seed images with varying density distributions, providing reliable parameter feedback for adaptive control of intelligent hulling equipment.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70142","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pan Zhang, Jiangtao Luo, Guoliang Xu, Xupeng Liang
{"title":"A Two-Stage Homography Matrix Prediction Approach for Trajectory Generation in Multi-Object Tracking on Sports Fields","authors":"Pan Zhang, Jiangtao Luo, Guoliang Xu, Xupeng Liang","doi":"10.1049/ipr2.70136","DOIUrl":"10.1049/ipr2.70136","url":null,"abstract":"<p>Homography estimation is a fundamental topic in computer vision, especially in scenarios that require perspective changes for intelligent analysis of sports fields, where it plays a crucial role. Existing methods predict the homography matrix either indirectly by evaluating the 4-key-point coordinate deviation in paired images with the same visual content or directly by fine-tuning the 8 degrees of freedom numerical values that define the matrix. However, these approaches often fail to effectively incorporate coordinate positional information and overlook optimal application scenarios, leading to significant accuracy bottlenecks, particularly for paired images with differing visual content. To address these issues, we propose an approach that integrates both methods in a staged manner, leveraging their respective advantages. In the first stage, positional information is embedded to enhance convolutional computations, replacing serial concatenation in traditional feature fusion with parallel concatenation, while using 4-key-point coordinate deviation to predict the macroscopic homography matrix. In the second stage, positional information is further integrated into the input images to refine the direct 8 degrees of freedom numerical predictions, improving matrix fine-tuning accuracy. Comparative experiments with state-of-the-art methods demonstrate that our approach achieves superior performance, yielding a root mean square error as low as 1.25 and an average corner errror as low as 14.1 in homography transformation of competitive sports image pairs.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70136","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasir Adil Mukhlif, Nehad T. A. Ramaha, Alaa Ali Hameed
{"title":"AEC-CapsNet: Enhanced Capsule Networks With Attention and Expansion-Contraction Mechanisms for Advanced Feature Extraction in Medical Imaging","authors":"Yasir Adil Mukhlif, Nehad T. A. Ramaha, Alaa Ali Hameed","doi":"10.1049/ipr2.70120","DOIUrl":"10.1049/ipr2.70120","url":null,"abstract":"<p>The field of medical image analysis faces challenges due to the complexity of medical data. Convolutional neural networks (CNNs), while popular, often miss critical hierarchical and spatial structures essential for accurate diagnosis. Capsule networks (CapsNets) address some of these issues but struggle with extracting information from complex datasets. We propose the Expanded Attention and Contraction Enhanced Capsule Network with Attention (AEC-CapsNet), designed specifically for medical imaging tasks. AEC-CapsNet leverages attention mechanisms for improved feature representation and expansion-contraction modules for efficient management of global and local features, enabling superior feature extraction. The model is tested on six medical datasets: Jun Cheng Brain MRI (98.14% accuracy, 99.33% AUC), Breast_BreaKHis (98.50% accuracy, 98.96% AUC), HAM10000 (98.43% accuracy, 1.00% AUC), heel X-ray (97.47% accuracy, 99.30% AUC), LC250000 colon cancer histopathology (99.80% accuracy, 99.50%AUC) and LC250000 lung cancer histopathology (99.59% accuracy, 99.20%AUC). Additionally, on the general CIFAR-10 dataset, it achieves 83.48% accuracy, demonstrating robustness and generalisability. To ensure a comprehensive complete assessment, we applied cross-validation for each experiment, which allowed us to evaluate the model's stability and performance across different training datasets. The model was trained for multiple epochs (20, 40, 60, 80, 100, 120 and 140 epochs) to examine its learning and convergence patterns. Without dataset-specific augmentation or architectural modifications, AEC-CapsNet corrects critical weaknesses of existing methods, making it efficient, accurate, and reliable for automated medical image diagnostics.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70120","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Shafiq, Waeal J. Obidallah, Quanrun Fan, Anas Bilal, Yousef A. Alduraywish
{"title":"Scale-Wise Interaction Fusion Network for Land Cover Classification of Urban Scene Imagery","authors":"Muhammad Shafiq, Waeal J. Obidallah, Quanrun Fan, Anas Bilal, Yousef A. Alduraywish","doi":"10.1049/ipr2.70139","DOIUrl":"10.1049/ipr2.70139","url":null,"abstract":"<p>Accurate land cover classification of urban aerial imagery presents significant challenges, particularly in recognising small objects and similar-appearing features (e.g., flat land, prepared land for cultivation, crop growing areas and built-up regions along with ground water resource areas). These challenges arise due to the irregular scaling of extracted features at various rates from complex urban scenes and the mismatch in feature information flow across channels, ultimately affecting the overall accuracy (OA) of the network. To address these issues, we propose the scale-wise interaction fusion network (SIFN) for land cover classification of urban scene imagery. The SIFN comprises four key modules: multi-scale feature extraction, scale-wise interaction, feature shuffle-fusion and adaptive mask generation. The multi-scale feature extraction module captures contextual information across different dilation rates of convolutional layers, effectively handling varying object sizes. The scale-wise interaction module enhances the learning of multi-scale contextual features, while the feature shuffle-fusion module facilitates cross-scale information exchange, improving feature representation. Lastly, adaptive mask generation ensures precise boundary delineation and reduces misclassification in transitional zones. The proposed network significantly improves boundary masking accuracy for small and similar objects, thereby enhancing the overall land cover classification performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70139","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144339139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}