{"title":"Multi-Scale Feature Guided Transformer for Image Inpainting","authors":"Zeji Huang, Huanda Lu, Xin Yu, Hui Xiao","doi":"10.1049/ipr2.70105","DOIUrl":"https://doi.org/10.1049/ipr2.70105","url":null,"abstract":"<p>In recent years, image restoration has witnessed remarkable advancements. However, reconstructing visually plausible textures while preserving global structural coherence remains a persistent challenge. Existing convolutional neural network (CNN)-based approaches are inherently limited by their local receptive fields, often struggling to capture global structure. Previously proposed methods mostly focus on structural priors to address the limitation of CNN's receptive field, but we believe that texture priors are also critical factors that influence the quality of image inpainting. To tackle semantic inconsistency and texture blurriness in current methods, we introduce a novel multi-stage restoration framework. Specifically, our architecture incorporates a dual-stream U-Net with attention mechanisms to extract multi-scale features. The mixed attention-gated feature fusion module exchanges and combines structure and texture features to generate multi-scale fused feature maps, which are progressively merged into the decoder to guide the Transformer to generate more realistic images. Additionally, we propose a feature selection feedforward network to replace traditional MLPs in Transformer blocks for adaptive feature refinement. Extensive experiments on CelebA-HQ and Paris StreetView datasets demonstrate superior performance both qualitatively and quantitatively compared to state-of-the-art methods.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70105","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Sun, Kejun Cen, Xiaojun Ding, Sarah Haidar, Fengyuan Zou
{"title":"A Study on Region-Aware Fashion Design Based on Probabilistic Diffusion Model","authors":"Jie Sun, Kejun Cen, Xiaojun Ding, Sarah Haidar, Fengyuan Zou","doi":"10.1049/ipr2.70103","DOIUrl":"https://doi.org/10.1049/ipr2.70103","url":null,"abstract":"<p>To address the issue of image distortion caused by substantial structural discrepancies between original garment images and reference images in the application of image transfer technology for fashion design, this study proposes a region-aware fashion design method based on a probabilistic diffusion model. During the image feature extraction and output stage, the method integrates vision transformer (ViT) with a mask-guided mechanism, enabling the Diffusion model to precisely focus on the transferable regions of the original and reference images, thereby preserving the structural integrity and semantic consistency of the source images effectively. In the image colour and pattern style transfer stage, this study introduces an asymmetric gradient guidance (AGG) strategy to optimise the reverse sampling process of the diffusion model, substantially improving the quality and visual fidelity of the generated images. Experimental results indicate that this method achieves a Fréchet inception distance (FID) score of 103.4, surpassing existing fashion synthesis models. This facilitates the generation of more stable and realistic images for garment design tasks.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70103","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yajuan Zhao, Zhe Fan, Hehua Yao, Tong Zhang, Bingfeng Seng
{"title":"Automatic Classification and Recognition of Qinghai Embroidery Images Based on the SE-ResNet152V2 Model","authors":"Yajuan Zhao, Zhe Fan, Hehua Yao, Tong Zhang, Bingfeng Seng","doi":"10.1049/ipr2.70108","DOIUrl":"https://doi.org/10.1049/ipr2.70108","url":null,"abstract":"<p>As an essential part of traditional Chinese handicrafts, Qinghai embroidery embodies rich cultural connotations and unique artistic value. However, with the development of modern society, traditional handicrafts face severe challenges in inheritance and protection. To effectively address these challenges and promote Qinghai embroidery's digital safety and inheritance, this study realizes the automatic classification and identification of Qinghai embroidery images based on the SE-ResNet152V2 model. First, we constructed an image dataset containing five kinds of Qinghai embroidery patterns, including Tu nationality Pan embroidery, Huangzhong Dui embroidery, Hehuang embroidery, Mongolian embroidery, and Tibetan Guinan embroidery. The regions that contribute the most when the model judges the image categories are revealed by the GRAD-CAM technique, and the data are preprocessed by the image enhancement technique to enhance the data diversity and improve the model's generalization ability. For the complexity and detailed features of Qinghai embroidery patterns, this paper introduces the squeeze-and-excitation (SE) attention module to enhance the model's ability to capture key features. By systematically comparing the effects of multiple optimizers and attention mechanisms combination models, the optimal combination of the Nadam optimizer and SE attention mechanism is finally selected. The experimental results show that the accuracy of the optimized SE-ResNet152V2 model on the self-built Qinghai embroidery image dataset is 91.73%, which is 11.43% higher than that of the original ResNet152V2 model. Further experiments show that the SE-ResNet152V2 model is better than other popular neural network models such as MobileNetV1, EfficientNetB2, vision transformer (VIT), and swin transformer, regarding classification accuracy. It has proven its effectiveness and superiority in processing Qinghai embroidery pattern recognition tasks and provided strong technical support for digital protection and inheritance of traditional crafts.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70108","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dingping Chen, Lu Dai, Xiangdi Yue, Qian Gu, Siming Huang, Jiaji Pan, Yihuan Zhang, Miaolei He
{"title":"AW-YOLO: A Multi-Object Detection Network for Autonomous Driving Under all Weather Conditions","authors":"Dingping Chen, Lu Dai, Xiangdi Yue, Qian Gu, Siming Huang, Jiaji Pan, Yihuan Zhang, Miaolei He","doi":"10.1049/ipr2.70111","DOIUrl":"https://doi.org/10.1049/ipr2.70111","url":null,"abstract":"<p>Over the years, object detection technology based on deep learning has attracted extensive research in autonomous driving. Achieving a robust object detection network under all weather conditions (e.g., sunny, fog, nighttime, rain and snow) is highly significant for autonomous driving systems, which ensure safety by recognising pedestrians, vehicles, traffic lights, etc. This paper proposes a robust multi-object detection network named All Weather-You Only Look Once (AW-YOLO) based on YOLOv8, with a trade-off between precision and lightweightness. Considering the blurring or absence of the salient object features of the image under all weather conditions, we propose a developed dilation-wise residual (D-DWR) module. Specifically, it combines the dilatation-wise residual module with the dilated re-param block using a large kernel convolution to see wide without going deep, greatly improving the feature extraction ability. Moreover, we introduce an efficient dynamic upsampler (DySample) that formulates upsampling from the viewpoint of point sampling and avoids dynamic convolution, which can improve the network's ability to feature fusion. Lightweight is an essential requirement for autonomous driving. To this end, we adopt a multi-scale shared detection head (MSSD-Head) to achieve lightweight deployment in autonomous vehicles. Experimental results show that the mAP50-95 values of AW-YOLO on the KITTI and Adverse Conditions Dataset with Correspondences (ACDC) datasets exceed the baseline model YOLOv8 by 1.7% and 1.5%, respectively. Meanwhile, the parameters and model size of AW-YOLO have decreased by 21.4% and 20.4%, respectively.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70111","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Avinash Singh, Namasivayam Ambalavanan, Nikolay M. Sirakov, Arie Nakhmani
{"title":"Iterative Data Distillation and Augmentation for Enhancing Deep Learning Performance With Limited Image Training Data","authors":"Avinash Singh, Namasivayam Ambalavanan, Nikolay M. Sirakov, Arie Nakhmani","doi":"10.1049/ipr2.70107","DOIUrl":"https://doi.org/10.1049/ipr2.70107","url":null,"abstract":"<p>Deep learning models require large training datasets. Incorporating additional data into small training datasets can enhance the model's performance. However, acquiring additional data may sometimes be challenging or beyond one's control. In such situations, data augmentation becomes essential to overcome the limited supply of labeled data by generating new data that preserves the essential properties of the original dataset. The primary objective of our research is to develop an iterative data distillation and augmentation (IDDA) method that enlarges the size of a limited image training dataset while preserving its properties. At every iteration, our method distills a set of images from the training set of the previous iteration utilizing the kernel inducing point (KIP) method, and the union of the training and distilled sets creates the new training set. However, our experiments show that IDDA is computationally expensive, increasing processing time by approximately 17%–27<span></span><math>\u0000 <semantics>\u0000 <mo>%</mo>\u0000 <annotation>$%$</annotation>\u0000 </semantics></math> for MNIST and Fashion-MNIST, 31%–39<span></span><math>\u0000 <semantics>\u0000 <mo>%</mo>\u0000 <annotation>$%$</annotation>\u0000 </semantics></math> for CIFAR-10, and up to 48%–49<span></span><math>\u0000 <semantics>\u0000 <mo>%</mo>\u0000 <annotation>$%$</annotation>\u0000 </semantics></math> for CIFAR-100 compared to state-of-the-art augmentation methods, due to the additional step of applying KIP for image distillation. We have experimentally determined that for a few iterations the classification accuracy increases and then drops afterward. We validate the IDDA capabilities by comparing it with conventional augmenting methods and MixUp on the following publicly available image datasets: MNIST digit, Fashion-MNIST, CIFAR-10, and CIFAR-100. Our approach proves highly effective for very limited datasets, addressing the challenge of database expansion for improved performance of deep learning models.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70107","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144091744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AMFE-YOLO: A Small Object Detection Model for Drone Images","authors":"Qi Wang, Chengxin Yu","doi":"10.1049/ipr2.70110","DOIUrl":"https://doi.org/10.1049/ipr2.70110","url":null,"abstract":"<p>Drones, due to their high efficiency and flexibility, have been widely applied. However, small objects captured by drones are easily affected by various conditions, resulting in suboptimal surveying performance. While the YOLO series has achieved significant success in detecting large targets, it still faces challenges in small target detection. To address this, we propose an innovative model, AMFE-YOLO, aimed at overcoming the bottlenecks in small target detection. Firstly, we introduce the AMFE module to focus on occluded targets, thereby improving detection capabilities in complex environments. Secondly, we design the SFSM module to merge shallow spatial information from the input features with deep semantic information obtained from the neck, enhancing the representation ability of small target features and reducing noise. Additionally, we implement a novel detection strategy that introduces an auxiliary detection head to identify very small targets. Finally, we reconfigured the detection head, effectively addressing the issue of false positives in small-object detection and improving the precision of small object detection. AMFE-YOLO outperforms methods like YOLOv10 and YOLOv11 in terms of mAP on the VisDrone2019 public dataset. Compared to the original YOLOv8s, the average precision improved by 5.5%, while the model parameter size was reduced by 0.7 M.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144100717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Craniopharyngioma Detection and Segmentation in MRI Images","authors":"Mohamed Nasor, Walid Obaid","doi":"10.1049/ipr2.70070","DOIUrl":"https://doi.org/10.1049/ipr2.70070","url":null,"abstract":"<p>A tumour is an abnormal growth of human body tissues. Tumours are classified as benign or malignant. Malignant tumours cause serious health complications that may threaten a patient's life. The diagnosis of such tumours requires experienced and trained medical specialists. Alternatively, computerised tumour detection and localisation can help physicians to reach accurate, fast and reliable diagnosis. Craniopharyngioma (CP) is a brain tumour located in the sellar and parasellar regions of the central nervous system. It causes various symptoms such as headaches, visual and neurological disturbances, growth retardation and delayed puberty. In addition to histological examinations, multiple tissue characteristics are evaluated for accurate diagnosis of CP tumours. Patients with craniopharyngiomas are treated by total excision and post-operative radiotherapy in cases that have no hypothalamic invasion or sub-total resection. Early detection and diagnosis of the tumour can minimise the complications associated with surgical and radiotherapy treatments. In this article, an image processing technique for the segmentation and detection of brain tumours in general and craniopharyngioma in particular using MRI brain images, is presented. The technique is based on K-means clustering, multiple thresholding and iterative morphological operations. It was tested on 104 MRI images and the quantitative analysis of its effectiveness showed performance values of 98%, 93%, 100%, 95% and 100% for precision, recall, specificity, Dice score eoefficient and accuracy, respectively.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70070","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144100718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengfei Zhang, Jian Liu, Jianqiang Zhang, Yiping Liu, Xingda Li
{"title":"MD-YOLOv8: A Multi-Object Detection Algorithm for Remote Sensing Satellite Images","authors":"Pengfei Zhang, Jian Liu, Jianqiang Zhang, Yiping Liu, Xingda Li","doi":"10.1049/ipr2.70106","DOIUrl":"https://doi.org/10.1049/ipr2.70106","url":null,"abstract":"<p>The technology for target recognition in remote sensing satellite images is widely applied in daily life, and research on detecting and recognizing targets in remote sensing images holds significant academic and practical importance. To address the challenges of extreme scale variations, dense target distributions, and low-resolution artefacts in remote sensing images, this paper proposes a new multi-object detection network based on the YOLOv8 architecture—MD-YOLOv8. The main contributions of this paper are threefold: (1) the design of the multi-frequency attention downsampling module, which integrates the ADown module with Haar wavelet transforms and pixel attention; (2) the proposal of the adaptive attention network (DMAA) module, an enhanced multiscale feature extractor based on the multiscale feature extraction attention mechanism; (3) the integration of both modules into the YOLOv8 backbone to achieve superior performance in remote sensing image detection. Based on the DOTA-1.0 dataset for training, experimental results show that the MD-YOLOv8 network achieves improvements in precision, recall rate, and [email protected], reaching 82.69%, 78.28%, and 82.05%, respectively; these represent increases of 3.76%, 3.43%, and 4.37% compared to the original model. In practical image detection, MD-YOLOv8 demonstrates higher recognition quality and can flexibly respond to various target types. The MD-YOLOv8 network effectively meets the accuracy requirements for target detection in remote sensing satellite images.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70106","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144091742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PartConverter: A Part-Oriented Transformation Framework for Point Clouds","authors":"Sheng-Yun Zeng, Tyng-Yeu Liang","doi":"10.1049/ipr2.70104","DOIUrl":"https://doi.org/10.1049/ipr2.70104","url":null,"abstract":"<p>With generative AI technologies advancing rapidly, the capabilities for 3D model generation and transformation are expanding across industries like manufacturing, healthcare, and virtual reality. However, existing methods based on generative adversarial networks (GANs), autoencoders, or transformers still have notable limitations. They primarily generate entire objects without providing flexibility for independent part transformation or precise control over model components. These constraints pose challenges for applications requiring complex object manipulation and fine-grained adjustments. To overcome these limitations, we propose PartConverter, a novel part-oriented point cloud transformation framework emphasizing flexibility and precision in 3D model transformations. PartConverter leverages attention mechanisms and autoencoders to capture crucial details within each part while modeling the relationships between components, thereby enabling highly customizable, part-wise transformations that maintain overall consistency. Additionally, our part assembler ensures that transformed parts align coherently, resulting in a consistent and realistic final 3D shape. This framework significantly enhances control over detailed part modeling, increasing the flexibility and efficiency of 3D model transformation workflows.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70104","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144100716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Early Warning and Monitoring Systems for Contact Between Live Equipment and Foreign Objects","authors":"Fang Yuan, BoYuan Chen, YunFei Tan, KaiYang Liao, WeiMin Xia","doi":"10.1049/ipr2.70101","DOIUrl":"https://doi.org/10.1049/ipr2.70101","url":null,"abstract":"<p>Addressing the safety hazards arising from inadequate clearance between tree growth and power facilities, this paper innovatively proposes an online monitoring and early warning system for distribution networks based on advanced image processing technology. The system integrates three core functions: automatic tree species identification, precise detection of safety distances between tree crowns and live equipment, and real-time foreign object contact warnings. By deploying maintenance-free online monitoring terminals, the system can monitor tree growth around the distribution network around the clock, continuously. It utilizes efficient backend algorithms to intelligently analyse the collected image data, promptly detecting and warning of potential tree-obstacle hazards. Compared to traditional manual inspections and UAV inspections, this system not only significantly improves monitoring accuracy and real-time performance but also overcomes the limitations of UAV inspections, such as high costs and poor real-time performance. The solution proposed in this paper is expected to fundamentally enhance the safe operation level of distribution networks and effectively reduce power accidents such as tripping and short circuits caused by tree obstacles. Future plans include further optimizing system functions to promote the wider application and promotion of the technology.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70101","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143944920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}