IET Image Processing最新文献

筛选
英文 中文
A Coarse-to-Fine Detection Framework for Automated Lung Tumour Detection From 3D PET/CT Images 从3D PET/CT图像自动检测肺部肿瘤的粗到细检测框架
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-30 DOI: 10.1049/ipr2.70146
Yunlong Zhao, Qiang Lin, Junfeng Mao, Jingjun Wei, Yongchun Cao, Zhengxing Man, Caihong Liu, Jingyan Ma, Xiaodi Huang
{"title":"A Coarse-to-Fine Detection Framework for Automated Lung Tumour Detection From 3D PET/CT Images","authors":"Yunlong Zhao,&nbsp;Qiang Lin,&nbsp;Junfeng Mao,&nbsp;Jingjun Wei,&nbsp;Yongchun Cao,&nbsp;Zhengxing Man,&nbsp;Caihong Liu,&nbsp;Jingyan Ma,&nbsp;Xiaodi Huang","doi":"10.1049/ipr2.70146","DOIUrl":"10.1049/ipr2.70146","url":null,"abstract":"<p>Lung cancer remains the leading cause of cancer-related mortality worldwide. Early detection is critical to improving treatment outcomes and survival rates. Positron emission tomography/computed tomography (PET/CT) is a widely used imaging modality for identifying lung tumours. However, limitations in imaging resolution and the complexity of cancer characteristics make detecting small lesions particularly challenging. To address this issue, we propose a novel coarse-to-fine detection framework to reduce missed diagnoses of small lung lesions in PET/CT images. Our method integrates a stacked detection structure with a multi-attention guidance mechanism, effectively leveraging spatial and contextual information from small lesions to enhance lesion localisation. Experimental evaluations on a PET/CT dataset of 225 patients demonstrate the effectiveness of our method, achieving remarkable results with a <i>precision</i> of 81.74%, a <i>recall</i> of 76.64%, and an <i>mAP</i> of 84.72%. The proposed framework not only improves the detection accuracy of small target lesions in the lung but also provides a more reliable solution for early diagnosis.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70146","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CG-VTON: Controllable Generation of Virtual Try-On Images Based on Multimodal Conditions CG-VTON:基于多模态条件的虚拟试车图像可控生成
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-26 DOI: 10.1049/ipr2.70144
Haopeng Lei, Xuan Zhao, Yaqin Liang, Yuanlong Cao
{"title":"CG-VTON: Controllable Generation of Virtual Try-On Images Based on Multimodal Conditions","authors":"Haopeng Lei,&nbsp;Xuan Zhao,&nbsp;Yaqin Liang,&nbsp;Yuanlong Cao","doi":"10.1049/ipr2.70144","DOIUrl":"10.1049/ipr2.70144","url":null,"abstract":"<p>Transforming fashion design sketches into realistic garments remains a challenging task due to the reliance on labor-intensive manual workflows that limit efficiency and scalability in traditional fashion pipelines. While recent advances in image generation and virtual try-on technologies have introduced partial automation, existing methods still lack controllability and struggle to maintain semantic consistency in garment pose and structure, restricting their applicability in real-world design scenarios. In this work, we present CG-VTON, a controllable virtual try-on framework designed to generate high-quality try-on images directly from clothing design sketches. The model integrates multi-modal conditional inputs, including dense human pose maps and textual garment descriptions, to guide the generation process. A novel pose constraint module is introduced to enhance garment-body alignment, while a structured diffusion-based pipeline performs progressive generation through latent denoising and global-context refinement. Extensive experiments conducted on benchmark datasets demonstrate that CG-VTON significantly outperforms existing state-of-the-art methods in terms of visual quality, pose consistency, and computational efficiency. By enabling high-fidelity and controllable try-on results from abstract sketches, CG-VTON offers a practical and robust solution for bridging the gap between conceptual design and realistic garment visualization.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70144","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCD-YOLOv10n: A Small Object Detection Algorithm for UAVs MCD-YOLOv10n:一种无人机小目标检测算法
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-26 DOI: 10.1049/ipr2.70145
Jinshuo Shi, Xitai Na, Shiji Hai, Qingbin Sun, Zhihui Feng, Xinyang Zhu
{"title":"MCD-YOLOv10n: A Small Object Detection Algorithm for UAVs","authors":"Jinshuo Shi,&nbsp;Xitai Na,&nbsp;Shiji Hai,&nbsp;Qingbin Sun,&nbsp;Zhihui Feng,&nbsp;Xinyang Zhu","doi":"10.1049/ipr2.70145","DOIUrl":"10.1049/ipr2.70145","url":null,"abstract":"<p>Deep neural networks deployed on UAVs have made significant progress in data acquisition in recent years. However, traditional algorithms and deep learning models still face challenges in small and unevenly distributed object detection tasks. To address this problem, we propose the MCD-YOLOv10n model by introducing the MEMAttention module, which combines EMAttention with multiscale convolution, uses Softmax and AdaptiveAvgPool2d to adaptively compute feature weights, dynamically adjusts the region of interest, and captures cross-scale features. In addition, the C2f_MEMAttention and C2f_DSConv modules are formed by the fusion of C2f with MEMAttention and DSConv, which enhances the model's ability of extracting and adapting to irregular target features. Experiments on three datasets, VisDrone-DET2019, Exdark and DOTA-v1.5, show that the evaluation metric mAP50 achieves the best detection accuracy of 32.9%, 52.9% and 68.2% when the number of holdout parameters is at the minimum value of 2.24M. Moreover, the mAP50-95 metrics (19.5% for VisDrone-DET2019 and 45.0% for DOTA-v1.5) are 1.1 and 1.2 percentage points ahead of the second place, respectively. In terms of Recall, the VisDrone-DET2019 and DOTA-v1.5 datasets improved by 1.0% and 0.7% over the baseline model. These results validate that MCD-YOLOv10n has strong adaptability and generalization ability for small object detection in complex scenes.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144492930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Swimming Post Recognition Using Novel Method Based on Score Estimation 基于分数估计的游泳岗位识别新方法
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-25 DOI: 10.1049/ipr2.70140
Xie Lina, Xianfeng Huang, Luo Jie, Jian Zheng
{"title":"Swimming Post Recognition Using Novel Method Based on Score Estimation","authors":"Xie Lina,&nbsp;Xianfeng Huang,&nbsp;Luo Jie,&nbsp;Jian Zheng","doi":"10.1049/ipr2.70140","DOIUrl":"10.1049/ipr2.70140","url":null,"abstract":"<p>Swimming sports are treated as modern competitive sports, and athletes need to standardize and correct their posture. Therefore, the recognition of swimming postures is considered as an important section the coaches implement training plans. Usually, the recognition of swimming postures is achieved through coach observation; however, this approach is inefficient and lacks sufficient accuracy. To address this issue, a novel recognition method is proposed. In the proposed method, different swimming postures are assigned a different score via using a two-stage scoring mechanism. The feature regions of swimming postures can be accurately identified. Following that, the assigned score is put into the Softmax layer of the proposed convolutional neural networks. Finally, 4000 images including six swimming postures are used as an experimental set. The experimental results show that the proposed method achieves 92.73% testing accuracy and 89.03% validation accuracy in the recognition of the six swimming postures, defeating against the opponents. Meanwhile, our method outperforms some competitors in terms of training efficiency. The proposed two-stage scoring mechanism can be used for image recognition in large-scale scenarios. Moreover, the two-stage scoring mechanism is independently of specific scenarios in process of assigning a score value for feature regions of images. Not only that, the two-stage scoring mechanism can replace complex network structures, so as to reduce the work of training parameters.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144482219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmented Multiple Perturbation Dual Mean Teacher Model for Semi-Supervised Intracranial Haemorrhage Segmentation 半监督颅内出血分割的增广多重摄动双均值教师模型
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-25 DOI: 10.1049/ipr2.70102
Yan Dong, Xiangjun Ji, Ting Wang, Chiyuan Ma, Zhenxing Li, Yanling Han, Kurosh Madani, Wenhui Wan
{"title":"Augmented Multiple Perturbation Dual Mean Teacher Model for Semi-Supervised Intracranial Haemorrhage Segmentation","authors":"Yan Dong,&nbsp;Xiangjun Ji,&nbsp;Ting Wang,&nbsp;Chiyuan Ma,&nbsp;Zhenxing Li,&nbsp;Yanling Han,&nbsp;Kurosh Madani,&nbsp;Wenhui Wan","doi":"10.1049/ipr2.70102","DOIUrl":"10.1049/ipr2.70102","url":null,"abstract":"<p>Generally, there are two problems restrict the intracranial haemorrhage (ICH) segmentation task: scarcity of labelled data, and poor accuracy of ICH segmentation. To address these two issues, we propose a semi-supervised ICH segmentation model and a dedicated ICH segmentation backbone network. Our approach aims at leveraging semi-supervised modelling so as to alleviate the challenge of limited labelled data availability, while the dedicated ICH segmentation backbone network further enhances the segmentation precision. An augmented multiple perturbation dual mean teacher model is designed. Based on it, the prediction accuracy may be improved by a more stringent confidence-weighted cross-entropy (CW-CE) loss, and the feature perturbation may be increased using adversarial feature perturbation for the purpose of improving the generalization ability and efficiency of consistent learning. In the ICH segmentation backbone network, we promote the segmentation accuracy by extracting both local and global features of ICH and fusing them in depth. We also fuse the features with rich details from the upper encoder during the up-sampling process to reduce the loss of feature information. Experiments on our private dataset ICHDS, and the public dataset IN22SD demonstrate that our model outperforms current state-of-the-art ICH segmentation models, achieving a maximum improvement of over 10% in Dice and exhibiting the best overall performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70102","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144482250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HCTMIF: Hybrid CNN-Transformer Multi Information Fusion Network for Low Light Image Enhancement hctif:用于弱光图像增强的CNN-Transformer混合多信息融合网络
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-24 DOI: 10.1049/ipr2.70127
Han Wang, Hengshuai Cui, Jinjiang Li, Zhen Hua
{"title":"HCTMIF: Hybrid CNN-Transformer Multi Information Fusion Network for Low Light Image Enhancement","authors":"Han Wang,&nbsp;Hengshuai Cui,&nbsp;Jinjiang Li,&nbsp;Zhen Hua","doi":"10.1049/ipr2.70127","DOIUrl":"10.1049/ipr2.70127","url":null,"abstract":"<p>Images captured with poor hardware and insufficient light sources suffer from visual degradation such as low visibility, strong noise, and color casts. Low-light image enhancement methods focus on solving the problem of brightness in dark areas while eliminating the degradation of low-light images. To solve the above problems, we proposed a hybrid CNN-transformer multi information fusion network (HCTMIF) for low-light image enhancement. In this paper, the proposed network architecture is divided into three stages to progressively improve the degraded features of low-light images using the divide-and-conquer principle. First, both the first stage and the second stage adopt the encoder–decoder architecture composed of transformer and CNN to improve the long-distance modeling and local feature extraction capabilities of the network. We add a visual enhancement module (VEM) to the encoding block to further strengthen the network's ability to learn global and local information. In addition, the multi-information fusion block (MIFB) is used to complement the feature maps corresponding to the same scale of the coding block and decoding block of each layer. Second, to improve the mobility of useful information across stages, we designed the self-supervised module (SSM) to readjust the weight parameters to enhance the characterization of local features. Finally, to retain the spatial details of the enhanced images more precisely, we design the detail supplement unit (DSU) to enrich the saturation of the enhanced images. After qualitative and quantitative analyses on multiple benchmark datasets, our method outperforms other methods in terms of visual effects and metric scores.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70127","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adhered Buckwheat Seed Segmentation Method Based on Improved UNet 基于改进UNet的粘附荞麦种子分割方法
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-24 DOI: 10.1049/ipr2.70142
Shaozhong Lv, Shuaiming Guan, Zhengbing Xiong
{"title":"Adhered Buckwheat Seed Segmentation Method Based on Improved UNet","authors":"Shaozhong Lv,&nbsp;Shuaiming Guan,&nbsp;Zhengbing Xiong","doi":"10.1049/ipr2.70142","DOIUrl":"10.1049/ipr2.70142","url":null,"abstract":"<p>To address the issue of adhesion segmentation caused by the small volume, diverse morphology, large quantity, and fuzzy adherence boundaries of seeds in the output image of the buckwheat hulling machine, this paper proposes a semantic segmentation model, ResCo-UNet. The model integrates the ResNet18 network structure in the encoder of UNet, enhancing feature extraction capabilities and accelerating network training speed. To improve the recognition of adhered target boundaries, a novel parallel attention mechanism, CA<sup>2</sup>, is designed and incorporated into ResNet18, thereby enhancing the extraction of high-level semantic information. In the decoder stage, ConvNeXt modules are introduced to expand the receptive field, enhancing the model's ability to reconstruct complex boundary features. Results demonstrate that ResCo-UNet exhibits stronger generalization capabilities compared to other models, showing significant enhancements across multiple metrics: 87.81% mIoU, 92.71% recall, and 93.33% <i>F</i>1-score. Compared to the baseline model, these metrics increased by 6.71%, 5.13%, and 4.32%, respectively. Analysis of detection results across images with different distribution densities revealed an average counting accuracy of 98.89% for adherent seeds. The model effectively segments adhered seed images with varying density distributions, providing reliable parameter feedback for adaptive control of intelligent hulling equipment.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70142","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Two-Stage Homography Matrix Prediction Approach for Trajectory Generation in Multi-Object Tracking on Sports Fields 运动场上多目标跟踪轨迹生成的两阶段单应矩阵预测方法
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-24 DOI: 10.1049/ipr2.70136
Pan Zhang, Jiangtao Luo, Guoliang Xu, Xupeng Liang
{"title":"A Two-Stage Homography Matrix Prediction Approach for Trajectory Generation in Multi-Object Tracking on Sports Fields","authors":"Pan Zhang,&nbsp;Jiangtao Luo,&nbsp;Guoliang Xu,&nbsp;Xupeng Liang","doi":"10.1049/ipr2.70136","DOIUrl":"10.1049/ipr2.70136","url":null,"abstract":"<p>Homography estimation is a fundamental topic in computer vision, especially in scenarios that require perspective changes for intelligent analysis of sports fields, where it plays a crucial role. Existing methods predict the homography matrix either indirectly by evaluating the 4-key-point coordinate deviation in paired images with the same visual content or directly by fine-tuning the 8 degrees of freedom numerical values that define the matrix. However, these approaches often fail to effectively incorporate coordinate positional information and overlook optimal application scenarios, leading to significant accuracy bottlenecks, particularly for paired images with differing visual content. To address these issues, we propose an approach that integrates both methods in a staged manner, leveraging their respective advantages. In the first stage, positional information is embedded to enhance convolutional computations, replacing serial concatenation in traditional feature fusion with parallel concatenation, while using 4-key-point coordinate deviation to predict the macroscopic homography matrix. In the second stage, positional information is further integrated into the input images to refine the direct 8 degrees of freedom numerical predictions, improving matrix fine-tuning accuracy. Comparative experiments with state-of-the-art methods demonstrate that our approach achieves superior performance, yielding a root mean square error as low as 1.25 and an average corner errror as low as 14.1 in homography transformation of competitive sports image pairs.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70136","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AEC-CapsNet: Enhanced Capsule Networks With Attention and Expansion-Contraction Mechanisms for Advanced Feature Extraction in Medical Imaging AEC-CapsNet:基于关注和扩张-收缩机制的增强胶囊网络用于医学成像的高级特征提取
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-24 DOI: 10.1049/ipr2.70120
Yasir Adil Mukhlif, Nehad T. A. Ramaha, Alaa Ali Hameed
{"title":"AEC-CapsNet: Enhanced Capsule Networks With Attention and Expansion-Contraction Mechanisms for Advanced Feature Extraction in Medical Imaging","authors":"Yasir Adil Mukhlif,&nbsp;Nehad T. A. Ramaha,&nbsp;Alaa Ali Hameed","doi":"10.1049/ipr2.70120","DOIUrl":"10.1049/ipr2.70120","url":null,"abstract":"<p>The field of medical image analysis faces challenges due to the complexity of medical data. Convolutional neural networks (CNNs), while popular, often miss critical hierarchical and spatial structures essential for accurate diagnosis. Capsule networks (CapsNets) address some of these issues but struggle with extracting information from complex datasets. We propose the Expanded Attention and Contraction Enhanced Capsule Network with Attention (AEC-CapsNet), designed specifically for medical imaging tasks. AEC-CapsNet leverages attention mechanisms for improved feature representation and expansion-contraction modules for efficient management of global and local features, enabling superior feature extraction. The model is tested on six medical datasets: Jun Cheng Brain MRI (98.14% accuracy, 99.33% AUC), Breast_BreaKHis (98.50% accuracy, 98.96% AUC), HAM10000 (98.43% accuracy, 1.00% AUC), heel X-ray (97.47% accuracy, 99.30% AUC), LC250000 colon cancer histopathology (99.80% accuracy, 99.50%AUC) and LC250000 lung cancer histopathology (99.59% accuracy, 99.20%AUC). Additionally, on the general CIFAR-10 dataset, it achieves 83.48% accuracy, demonstrating robustness and generalisability. To ensure a comprehensive complete assessment, we applied cross-validation for each experiment, which allowed us to evaluate the model's stability and performance across different training datasets. The model was trained for multiple epochs (20, 40, 60, 80, 100, 120 and 140 epochs) to examine its learning and convergence patterns. Without dataset-specific augmentation or architectural modifications, AEC-CapsNet corrects critical weaknesses of existing methods, making it efficient, accurate, and reliable for automated medical image diagnostics.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70120","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144473136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scale-Wise Interaction Fusion Network for Land Cover Classification of Urban Scene Imagery 城市场景影像土地覆盖分类的尺度交互融合网络
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-06-22 DOI: 10.1049/ipr2.70139
Muhammad Shafiq, Waeal J. Obidallah, Quanrun Fan, Anas Bilal, Yousef A. Alduraywish
{"title":"Scale-Wise Interaction Fusion Network for Land Cover Classification of Urban Scene Imagery","authors":"Muhammad Shafiq,&nbsp;Waeal J. Obidallah,&nbsp;Quanrun Fan,&nbsp;Anas Bilal,&nbsp;Yousef A. Alduraywish","doi":"10.1049/ipr2.70139","DOIUrl":"10.1049/ipr2.70139","url":null,"abstract":"<p>Accurate land cover classification of urban aerial imagery presents significant challenges, particularly in recognising small objects and similar-appearing features (e.g., flat land, prepared land for cultivation, crop growing areas and built-up regions along with ground water resource areas). These challenges arise due to the irregular scaling of extracted features at various rates from complex urban scenes and the mismatch in feature information flow across channels, ultimately affecting the overall accuracy (OA) of the network. To address these issues, we propose the scale-wise interaction fusion network (SIFN) for land cover classification of urban scene imagery. The SIFN comprises four key modules: multi-scale feature extraction, scale-wise interaction, feature shuffle-fusion and adaptive mask generation. The multi-scale feature extraction module captures contextual information across different dilation rates of convolutional layers, effectively handling varying object sizes. The scale-wise interaction module enhances the learning of multi-scale contextual features, while the feature shuffle-fusion module facilitates cross-scale information exchange, improving feature representation. Lastly, adaptive mask generation ensures precise boundary delineation and reduces misclassification in transitional zones. The proposed network significantly improves boundary masking accuracy for small and similar objects, thereby enhancing the overall land cover classification performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70139","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144339139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信