Mohammadhossein Ahmadi, Nima Kambarani, Mohammad Reza Mohammadi
{"title":"Parameter efficient face frontalization in image sequences via GAN inversion","authors":"Mohammadhossein Ahmadi, Nima Kambarani, Mohammad Reza Mohammadi","doi":"10.1049/ipr2.70003","DOIUrl":"https://doi.org/10.1049/ipr2.70003","url":null,"abstract":"<p>Processing facial images with varying poses is a significant challenge. Most existing face frontalization methods rely on heavy architectures that struggle with small datasets and produce low-quality images. Additionally, although video frames provide richer information, these methods typically use single images due to the lack of suitable multi-image datasets. To address these issues, a parameter-efficient framework for high-quality face frontalization in both single and multi-frame scenarios is proposed. First, a high-quality, diverse dataset is created for single and multi-image face frontalization tasks. Second, a novel single-image face frontalization method is introduced by combining GAN inversion with transfer learning. This approach reduces the number of trainable parameters by over 91% compared to existing GAN inversion methods while achieving far more photorealistic results than GAN-based methods. Finally, this method is extended to sequences of images, using attention mechanisms to merge information from multiple frames. This multi-frame approach reduces artefacts like eye blinks and improves reconstruction quality. Experiments demonstrate that this single-image method outperforms pSp, a state-of-the-art GAN inversion method, with a 0.15 LPIPS improvement and a 0.10 increase in ID similarity. This multi-frame approach further improves identity preservation to 0.87, showcasing its effectiveness for high-quality frontal-view reconstructions.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonardo Rossi, Vittorio Bernuzzi, Tomaso Fontanini, Massimo Bertozzi, Andrea Prati
{"title":"Swin2-MoSE: A new single image supersolution model for remote sensing","authors":"Leonardo Rossi, Vittorio Bernuzzi, Tomaso Fontanini, Massimo Bertozzi, Andrea Prati","doi":"10.1049/ipr2.13303","DOIUrl":"https://doi.org/10.1049/ipr2.13303","url":null,"abstract":"<p>Due to the limitations of current optical and sensor technologies and the high cost of updating them, the spectral and spatial resolution of satellites may not always meet desired requirements. For these reasons, Remote-Sensing Single-Image Super-Resolution (RS-SISR) techniques have gained significant interest. In this paper, Swin2-MoSE model is proposed, an enhanced version of Swin2SR. The model introduces MoE-SM, an enhanced Mixture-of-Experts (MoE) to replace the Feed-Forward inside all Transformer block. MoE-SM is designed with Smart-Merger, and new layer for merging the output of individual experts, and with a new way to split the work between experts, defining a new per-example strategy instead of the commonly used per-token one. Furthermore, it is analyzed how positional encodings interact with each other, demonstrating that per-channel bias and per-head bias can positively cooperate. Finally, the authors propose to use a combination of Normalized-Cross-Correlation (NCC) and Structural Similarity Index Measure (SSIM) losses, to avoid typical MSE loss limitations. Experimental results demonstrate that Swin2-MoSE outperforms any Swin derived models by up to 0.377–0.958 dB (PSNR) on task of <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 <mo>×</mo>\u0000 </mrow>\u0000 <annotation>$2times$</annotation>\u0000 </semantics></math>, <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mn>3</mn>\u0000 <mo>×</mo>\u0000 </mrow>\u0000 <annotation>$3times$</annotation>\u0000 </semantics></math> and <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mn>4</mn>\u0000 <mo>×</mo>\u0000 </mrow>\u0000 <annotation>$4times$</annotation>\u0000 </semantics></math> resolution-upscaling (<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mtext>Sen2Ven</mtext>\u0000 <mi>μ</mi>\u0000 <mi>s</mi>\u0000 </mrow>\u0000 <annotation>$text{Sen2Ven}mu text{s}$</annotation>\u0000 </semantics></math> and OLI2MSI datasets). It also outperforms SOTA models by a good margin, proving to be competitive and with excellent potential, especially for complex tasks. Additionally, an analysis of computational costs is also performed. Finally, the efficacy of Swin2-MoSE is shown, applying it to a semantic segmentation task (SeasoNet dataset). Code and pretrained are available on https://github.com/IMPLabUniPr/swin2-mose/tree/official_code</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.13303","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LGS-Net: A lightweight convolutional neural network based on global feature capture for spatial image steganalysis","authors":"Yuanyuan Ma, Jian Wang, Xinyu Zhang, Guifang Wang, Xianwei Xin, Qianqian Zhang","doi":"10.1049/ipr2.70005","DOIUrl":"https://doi.org/10.1049/ipr2.70005","url":null,"abstract":"<p>The purpose of image steganalysis is to detect whether the transmitted images in network communication contain secret messages. Current image steganalysis networks still have some problems such as inappropriate feature selection and easy overfitting. Therefore, this paper proposed a new spatial image steganalysis method based on convolutional neural networks. To extract richer features while reducing useless\u0000parameters in the network, this paper introduced the Im SRM filtering kernel into the image preprocessing module. To extract effective steganography noise from images, this paper combined depthwise separable convolution and residual networks for the first time and introduces them into the steganography noise extraction module. In addition, to focus network attention on the image regions where steganography information exists, this paper integrated the coordinate attention mechanism. This module will make the network pay attention to the overall structure and local details of the image during network training, improving the network's recognition ability for steganography information. Finally, the extracted steganography features are classified through a classification module. This paper conducted a series of experiments on the BOSSBase 1.01 and BOWS2 datasets. The improvement in detection accuracy is between 1.2% and 18.2% compared to classic and recent steganalysis networks.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Moving target detection based on improved Gaussian mixture model in dynamic and complex environments","authors":"Jiaxin Li, Fajie Duan, Xiao Fu, Guangyue Niu, Rui Wang, Hao Zheng","doi":"10.1049/ipr2.70001","DOIUrl":"https://doi.org/10.1049/ipr2.70001","url":null,"abstract":"<p>Recently, background modeling has garnered significant attention for motion target detection in vision and image applications. However, most methods do not achieve satisfactory results because of the influence of background dynamics and other factors. The Gaussian mixture model (GMM) background modeling method is a popular and powerful motion background modeling technology owing to its ability to balance robustness and real-time constraints in various practical environments. However, when the background is complex and the target moves slowly, the traditional GMM cannot accurately detect the target and is prone to misjudging the moving background as a moving target. To address the interference from complex backgrounds, this study proposes a target detection method that combines an adaptive GMM with an improved three-frame difference method, along with an algorithm that combines grayscale statistics with an improved Phong illumination model for illumination compensation and shadow removal. The experimental results demonstrate that the improved method has better robustness, improves target detection accuracy, and reduces noise and background interference.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From pixels to prognosis: Attention-CNN model for COVID-19 diagnosis using chest CT images","authors":"Suba Suseela, Nita Parekh","doi":"10.1049/ipr2.13249","DOIUrl":"https://doi.org/10.1049/ipr2.13249","url":null,"abstract":"<p>Deep learning assisted diagnosis for assessing the severity of various respiratory infections using chest computed tomography (CT) scan images has gained much attention after the COVID-19 pandemic. Major tasks while building such models require an understanding of the characteristic features associated with the disease, patient-to-patient variations and changes associated with disease severity. In this work, an attention-based convolutional neural network (CNN) model with customized bottleneck residual module (Attn-CNN) is proposed for classifying CT images into three classes: COVID-19, normal, and other pneumonia. The efficacy of the model is evaluated by carrying out various experiments, such as effect of class imbalance, impact of attention module, generalizability of the model and providing visualization of model's prediction for the interpretability of results. Comparative performance evaluation with five state-of-the-art deep architectures such as MobileNet, EfficientNet-B7, Inceptionv3, ResNet-50 and VGG-16, and with published models such as COVIDNet-CT, COVNet, COVID-Net CT2, etc. is discussed.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.13249","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive survey of crowd density estimation and counting","authors":"Mingtao Wang, Xin Zhou, Yuanyuan Chen","doi":"10.1049/ipr2.13328","DOIUrl":"https://doi.org/10.1049/ipr2.13328","url":null,"abstract":"<p>Crowd counting is one of the important and challenging research topics in computer vision. In recent years, with the rapid development of deep learning, the model architectures, learning paradigms, and counting accuracy have undergone significant changes. To help researchers quickly understand the research progress in this area, this paper presents a comprehensive survey of crowd density estimation and counting approaches. Initially, the technical challenges and commonly used datasets are intoroduced for crowd counting. Crowd counting approaches is them categorized into two groups based on the feature extraction methods employed: traditional approaches and deep learning-based approaches. A systematic and focused analysis of deep learning-based approaches is proposed. Subsequently, some training and evaluation details are introduced, including labels generation, loss functions, supervised training methods, and evaluation metrics. The accuracy and robustness of selected classical models are further compared. Finally, future prospects, strategies, and challenges are discussed for crowd counting. This review is comprehensive and timely, stemming from the selection of prominent and unique works.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.13328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143119778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face de-morphing based on identity feature transfer","authors":"Le-Bing Zhang, Song Chen, Min Long, Juan Cai","doi":"10.1049/ipr2.13324","DOIUrl":"https://doi.org/10.1049/ipr2.13324","url":null,"abstract":"<p>Face morphing attacks have emerged as a significant security threat, compromising the reliability of facial recognition systems. Despite extensive research on morphing detection, limited attention has been given to restoring accomplice face images, which is critical for forensic applications. This study aims to address this gap by proposing a novel face de-morphing (FD) method based on identity feature transfer for restoring accomplice face images. The method encodes facial attribute and identity features separately and employs cross-attention mechanisms to extract identity features from morphed faces relative to reference images. This process isolates and enhances the accomplice's identity features. Additionally, inverse linear interpolation is applied to transfer identity features to attribute features, further refining the restoration process. The enhanced identity features are then integrated with the StyleGAN generator to reconstruct high-quality accomplice facial images. Experimental evaluations on two morphed face datasets demonstrate the effectiveness of the proposed approach, improving the average restoration accuracy by at least 5% compared with other methods. These findings highlight the potential of this approach for advancing forensic and security applications.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.13324","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143119702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse representation for restoring images by exploiting topological structure of graph of patches","authors":"Yaxian Gao, Zhaoyuan Cai, Xianghua Xie, Jingjing Deng, Zengfa Dou, Xiaoke Ma","doi":"10.1049/ipr2.70004","DOIUrl":"https://doi.org/10.1049/ipr2.70004","url":null,"abstract":"<p>Image restoration poses a significant challenge, aiming to accurately recover damaged images by delving into their inherent characteristics. Various models and algorithms have been explored by researchers to address different types of image distortions, including sparse representation, grouped sparse representation, and low-rank self-representation. The grouped sparse representation algorithm leverages the prior knowledge of non-local self-similarity and imposes sparsity constraints to maintain texture information within images. To further exploit the intrinsic properties of images, this study proposes a novel low-rank representation-guided grouped sparse representation image restoration algorithm. This algorithm integrates self-representation models and trace optimization techniques to effectively preserve the original image structure, thereby enhancing image restoration performance while retaining the original texture and structural information. The proposed method was evaluated on image denoising and deblocking tasks across several datasets, demonstrating promising results.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143118949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integrative survey on Indian sign language recognition and translation","authors":"Rina Damdoo, Praveen Kumar","doi":"10.1049/ipr2.70000","DOIUrl":"https://doi.org/10.1049/ipr2.70000","url":null,"abstract":"<p>Hard of hearing (HoH) people commonly use sign languages (SLs) to communicate. They face major impediments in communicating with hearing individuals, mostly because hearing people are unaware of SLs. Therefore, it is important to promote tools that enable communication between users of sign language and users of spoken languages. The study of sign language recognition and translation (SLRT) is a step forward in this direction, as it tries to create a spoken-language translation of a sign-language video or vice versa. This study aims to survey the Indian sign language (ISL) interpretation literature and gives pertinent information about ISL recognition and translation (ISLRT). It provides an overview of recent advances in ISLRT, including the use of machine learning based, deep learning based, and gesture-based techniques. This work also summarizes the development of ISL datasets and dictionaries. It highlights the gaps in the literature and provides recommendations for future research opportunities for ISLRT development.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143118950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bootstrapping vision–language transformer for monocular 3D visual grounding","authors":"Qi Lei, Shijie Sun, Xiangyu Song, Huansheng Song, Mingtao Feng, Chengzhong Wu","doi":"10.1049/ipr2.13315","DOIUrl":"https://doi.org/10.1049/ipr2.13315","url":null,"abstract":"<p>In the task of 3D visual grounding using monocular RGB images, it is a challenging problem to perceive visual features and accurately predict the localization of 3D objects based on given geometric and appearances descriptions. Traditional text-guided attention-based methods have achieved better results than baselines, but it is argued that there is still potential for improvement in the area of multi-modal fusion. Thus, Mono3DVG-TRv2, an end-to-end transformer-based architecture that employs a visual-text multi-modal encoder for the alignment and fusion of multi-modal features, incorporating an enhanced transformer module proven in 2D detection, is introduced. The depth features predicted by the multi-modal features and the visual-text features are associated with the learnable queries in the decoder, facilitating more efficient and effective acquisition of geometric information in intricate scenes. Following a comprehensive comparison and ablation study on the Mono3DRefer dataset, this method achieves state-of-the-art performance, markedly surpassing the prior approach. The code will be released at https://github.com/Jade-Ray/Mono3DVGv2.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.13315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143118514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}