Signal Processing-Image Communication最新文献

筛选
英文 中文
Approximation-based energy-efficient cyber-secured image classification framework 基于近似的节能网络安全图像分类框架
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-01-16 DOI: 10.1016/j.image.2025.117261
M.A. Rahman , Salma Sultana Tunny , A.S.M. Kayes , Peng Cheng , Aminul Huq , M.S. Rana , Md. Rashidul Islam , Animesh Sarkar Tusher
{"title":"Approximation-based energy-efficient cyber-secured image classification framework","authors":"M.A. Rahman ,&nbsp;Salma Sultana Tunny ,&nbsp;A.S.M. Kayes ,&nbsp;Peng Cheng ,&nbsp;Aminul Huq ,&nbsp;M.S. Rana ,&nbsp;Md. Rashidul Islam ,&nbsp;Animesh Sarkar Tusher","doi":"10.1016/j.image.2025.117261","DOIUrl":"10.1016/j.image.2025.117261","url":null,"abstract":"<div><div>In this work, an energy-efficient cyber-secured framework for deep learning-based image classification is proposed. This simultaneously addresses two major concerns in relevant applications, which are typically handled separately in the existing works. An image approximation-based data storage scheme to improve the efficiency of memory usage while reducing energy consumption at both the source and user ends is discussed. Also, the proposed framework mitigates the impacts of two different adversarial attacks, notably retaining performance. The experimental analysis signifies the academic and industrial importance of this work as it demonstrates reductions of 62.5% in energy consumption for image classification when accessing memory and in the effective memory sizes of both ends by the same amount. During the improvement of memory efficiency, the multi-scale structural similarity index measure (MS-SSIM) is found to be the optimum image quality assessment method among different similarity-based metrics for the image classification task with approximated images and an average image quality of 0.9449 in terms of MS-SSIM is maintained. Also, a comparative analysis of three different classifiers with different depths indicates that the proposed scheme maintains up to 90.17% of original classification accuracy under normal and cyber-attack scenarios, effectively defending against untargeted and targeted white-box adversarial attacks with varying parameters.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117261"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143127943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spiking two-stream methods with unsupervised STDP-based learning for action recognition 基于无监督stdp学习的动作识别尖峰双流方法
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-01-16 DOI: 10.1016/j.image.2025.117263
Mireille El-Assal, Pierre Tirilly, Ioan Marius Bilasco
{"title":"Spiking two-stream methods with unsupervised STDP-based learning for action recognition","authors":"Mireille El-Assal,&nbsp;Pierre Tirilly,&nbsp;Ioan Marius Bilasco","doi":"10.1016/j.image.2025.117263","DOIUrl":"10.1016/j.image.2025.117263","url":null,"abstract":"<div><div>Video analysis is a computer vision task that is useful for many applications like surveillance, human-machine interaction, and autonomous vehicles. Deep learning methods are currently the state-of-the-art methods for video analysis. Particularly, two-stream methods, which leverage both spatial and temporal information, have proven to be valuable in Human Action Recognition (HAR). However, they have high computational costs, and need a large amount of labeled data for training. In addressing these challenges, this paper adopts a more efficient approach by leveraging Convolutional Spiking Neural Networks (CSNNs) trained with the unsupervised Spike Timing-Dependent Plasticity (STDP) learning rule for action classification. These networks represent the information using asynchronous low-energy spikes, which allows the network to be more energy efficient when implemented on neuromorphic hardware. Furthermore, learning visual features with unsupervised learning reduces the need for labeled data during training, making the approach doubly advantageous. Therefore, we explore transposing two-stream convolutional neural networks into the spiking domain, where we train each stream with the unsupervised STDP learning rule. We investigate the performance of these networks in video analysis by employing five distinct configurations for the temporal stream, and evaluate them across four benchmark HAR datasets. In this work, we show that two-stream CSNNs can successfully extract spatio-temporal information from videos despite using limited training data, and that the spiking spatial and temporal streams are complementary. We also show that replacing a dedicated temporal stream with a spatio-temporal one within a spiking two-stream architecture leads to information redundancy that hinders the performance.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117263"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional Laplacian pyramid networks for exposure correction 曝光校正的条件拉普拉斯金字塔网络
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-01-16 DOI: 10.1016/j.image.2025.117276
Mengyuan Huang , Kan Chang , Qingpao Qin , Yahui Tang , Guiqing Li
{"title":"Conditional Laplacian pyramid networks for exposure correction","authors":"Mengyuan Huang ,&nbsp;Kan Chang ,&nbsp;Qingpao Qin ,&nbsp;Yahui Tang ,&nbsp;Guiqing Li","doi":"10.1016/j.image.2025.117276","DOIUrl":"10.1016/j.image.2025.117276","url":null,"abstract":"<div><div>Improper exposures greatly degenerate the visual quality of images. Correcting various exposure errors in a unified framework is challenging as it requires simultaneously handling global attributes and local details under different exposure conditions. In this paper, we propose a conditional Laplacian pyramid network (CLPN) for correcting different exposure errors in the same framework. It applies Laplacian pyramid to decompose an improperly exposed image into a low-frequency (LF) component and several high-frequency (HF) components, and then enhances the decomposed components in a coarse-to-fine manner. To consistently correct a wide range of exposure errors, a conditional feature extractor is designed to extract the conditional feature from the given image. Afterwards, the conditional feature is used to guide the refinement of LF features, so that a precisely correction for illumination, contrast and color tone can be obtained. As different frequency components exhibit pixel-wise correlations, the frequency components in lower pyramid layers are used to support the reconstruction of the HF components in higher layers. By doing so, fine details can be effectively restored, while noises can be well suppressed. Extensive experiments show that our method is more effective than state-of-the-art methods on correcting various exposure conditions ranging from severe underexposure to intense overexposure.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117276"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based image captioning with semantic and spatial features 具有语义和空间特征的基于图的图像字幕
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-01-15 DOI: 10.1016/j.image.2025.117273
Mohammad Javad Parseh, Saeed Ghadiri
{"title":"Graph-based image captioning with semantic and spatial features","authors":"Mohammad Javad Parseh,&nbsp;Saeed Ghadiri","doi":"10.1016/j.image.2025.117273","DOIUrl":"10.1016/j.image.2025.117273","url":null,"abstract":"<div><div>Image captioning is a challenging task of image processing that aims to generate descriptive and accurate textual descriptions for images. In this paper, we propose a novel image captioning framework that leverages the power of spatial and semantic relationships between objects in an image, in addition to traditional visual features. Our approach integrates a pre-trained model, RelTR, as a backbone for extracting object bounding boxes and subject-predicate-object relationship pairs. We use these extracted relationships to construct spatial and semantic graphs, which are processed through separate Graph Convolutional Networks (GCNs) to obtain high-level contextualized features. At the same time, a CNN model is employed to extract visual features from the input image. To merge the feature vectors seamlessly, our approach involves using a multi-modal attention mechanism that is applied separately to the feature maps of the image, the nodes of the semantic graph, and the nodes of the spatial graph during each time step of the LSTM-based decoder. The model concatenates the attended features with the word embedding at the respective time step and fed into the LSTM cell. Our experiments demonstrate the effectiveness of our proposed approach, which competes closely with existing state-of-the-art image captioning techniques by capturing richer contextual information and generating accurate and semantically meaningful captions.</div><div>© 2025 Elsevier Inc. All rights reserved.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117273"},"PeriodicalIF":3.4,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143093314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ATM-DEN: Image Inpainting via attention transfer module and Decoder-Encoder network ATM-DEN:通过注意转移模块和解码器网络进行图像绘制
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-01-14 DOI: 10.1016/j.image.2025.117268
Siwei Zhang , Yuantao Chen
{"title":"ATM-DEN: Image Inpainting via attention transfer module and Decoder-Encoder network","authors":"Siwei Zhang ,&nbsp;Yuantao Chen","doi":"10.1016/j.image.2025.117268","DOIUrl":"10.1016/j.image.2025.117268","url":null,"abstract":"<div><div>The current prevailing techniques for image restoration predominantly employ self-encoding and decoding networks, aiming to reconstruct the original image during the decoding phase utilizing the compressed data captured during encoding. Nevertheless, the self-encoding network inherently suffers from information loss during compression, rendering it challenging to achieve nuanced restoration outcomes solely reliant on compressed information, particularly manifesting as blurred imagery and distinct edge artifacts around the restored areas. To mitigate this issue of insufficient image information utilization, we introduce a Multi-Stage Decoding Network in this study. This network leverages multiple decoders to decode and integrate features from each layer of the encoding stage, thereby enhancing the exploitation of encoder features across various scales. Subsequently, a feature mapping is derived that more accurately captures the content of the impaired region. Comparative experiments conducted on globally recognized datasets demonstrate that MSDN achieves a notable enhancement in the visual quality of restored images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117268"},"PeriodicalIF":3.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143127942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FLQ: Design and implementation of hybrid multi-base full logarithmic quantization neural network acceleration architecture based on FPGA 基于FPGA的混合多基全对数量化神经网络加速体系结构设计与实现
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-01-13 DOI: 10.1016/j.image.2025.117270
Longlong Zhang , Xiang Hu , Xuan Liao , Tong Zhou , Yuanxi Peng
{"title":"FLQ: Design and implementation of hybrid multi-base full logarithmic quantization neural network acceleration architecture based on FPGA","authors":"Longlong Zhang ,&nbsp;Xiang Hu ,&nbsp;Xuan Liao ,&nbsp;Tong Zhou ,&nbsp;Yuanxi Peng","doi":"10.1016/j.image.2025.117270","DOIUrl":"10.1016/j.image.2025.117270","url":null,"abstract":"<div><div>As deep neural network (DNN) models become more accurate, problems such as large model parameters and high computational complexity have become increasingly prominent, leading to a bottleneck in deploying them on resource-limited embedded platforms. In recent years, logarithm-based quantization techniques have shown great potential in reducing the inference cost of neural networks. However, current single-model log-quantization has reached an upper limit of classification performance, and little work has investigated hardware implementation of neural network quantization. In this paper, we propose a full logarithmic quantization (FLQ) mechanism that quantizes both weights and activation values into the logarithmic domain, compressing the parameters on AlexNet and VGG16 model by &gt;6.4 times while maintaining an accuracy loss of within 2.5 % compared with benchmarking. Furthermore, we propose two optimization solutions for FLQ: activation segmented full logarithmic quantization (ASFLQ) and multi-ratio activation segmented full logarithmic quantization (Multi-ASFLQ), which can better balance the numerical representation range and quantization step. Under the condition of weight quantization of 5 bits and activation value quantization of 4 bits, the optimization methods proposed in this paper can improve the TOP1 of the VGG16 network model by 1 % and 1.6 %, respectively. Subsequently, we propose an implementation scheme of computing unit corresponding to the optimized FLQ mechanism above, which can not only convert multiplication operations into a shift operation but also integrate functions such as different ratio logarithmic bases and sparsity processing for activation, minimizing resource consumption as well as avoiding unnecessary calculations. Finally, we experiment with VGG19, Retnet50, and Densenet169 models, proving that the proposed method can achieve good performance under lower bit quantization. © 2001 Elsevier Science. All rights reserved</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117270"},"PeriodicalIF":3.4,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image super-resolution based on multifractals in transfer domain 基于传递域多重分形的图像超分辨率
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2025-01-03 DOI: 10.1016/j.image.2024.117221
Xunxiang Yao, Qiang Wub, Peng Zhange, Fangxun Baod
{"title":"Image super-resolution based on multifractals in transfer domain","authors":"Xunxiang Yao,&nbsp;Qiang Wub,&nbsp;Peng Zhange,&nbsp;Fangxun Baod","doi":"10.1016/j.image.2024.117221","DOIUrl":"10.1016/j.image.2024.117221","url":null,"abstract":"<div><div>The goal of image super-resolution technique is to reconstruct high-resolution image with fine texture details from its low-resolution version.On Fourier domain,such fine details are more related to the information in the highfrequency spectrum. Most of existing methods do not have specific modules to handle such high-frequency information adaptively. Thus, they cause edge blur or texture disorder. To tackle the problems, this work explores image super-resolution on multiple sub-bands of the corresponding image, which are generated by NonSubsampled Contourlet Transform (NSCT). Different sub-bands hold the information of different frequency which is then related to the detailedness of information of the given low-resolution image.In this work, such image information detailedness is formulated as image roughness. Moreover, fractals analysis is applied to each sub-band image. Since fractals can mathematically represent the image roughness, it then is able to represent the detailedness (i.e. various frequency of image information). Overall, a multi-fractals formulation is established based on multiple sub-bands image. On each sub-band, different fractals representation is created adaptively. In this way, the image super-resolution process is transformed into a multifractal optimization problem. The experiment result demonstrates the effectiveness of the proposed method in recovering high-frequency details.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117221"},"PeriodicalIF":3.4,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Middle-output deep image prior for blind hyperspectral and multispectral image fusion 中输出深度图像先验的盲高光谱与多光谱图像融合
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-12-26 DOI: 10.1016/j.image.2024.117247
Jorge Bacca , Christian Arcos , Juan Marcos Ramírez , Henry Arguello
{"title":"Middle-output deep image prior for blind hyperspectral and multispectral image fusion","authors":"Jorge Bacca ,&nbsp;Christian Arcos ,&nbsp;Juan Marcos Ramírez ,&nbsp;Henry Arguello","doi":"10.1016/j.image.2024.117247","DOIUrl":"10.1016/j.image.2024.117247","url":null,"abstract":"<div><div>Obtaining a low-spatial-resolution hyperspectral image (HS) or low-spectral-resolution multispectral (MS) image from a high-resolution (HR) spectral image is straightforward with knowledge of the acquisition models. However, the reverse process, from HS and MS to HR, is an ill-posed problem known as spectral image fusion. Although recent fusion techniques based on supervised deep learning have shown promising results, these methods require large training datasets involving expensive acquisition costs and long training times. In contrast, unsupervised HS and MS image fusion methods have emerged as an alternative to data demand issues; however, they rely on the knowledge of the linear degradation models for optimal performance. To overcome these challenges, we propose the Middle-Output Deep Image Prior (MODIP) for unsupervised blind HS and MS image fusion. MODIP is adjusted for the HS and MS images, and the HR fused image is estimated at an intermediate layer within the network. The architecture comprises two convolutional networks that reconstruct the HR spectral image from HS and MS inputs, along with two networks that appropriately downscale the estimated HR image to match the available MS and HS images, learning the non-linear degradation models. The network parameters of MODIP are jointly and iteratively adjusted by minimizing a proposed loss function. This approach can handle scenarios where the degradation operators are unknown or partially estimated. To evaluate the performance of MODIP, we test the fusion approach on three simulated spectral image datasets (Pavia University, Salinas Valley, and CAVE) and a real dataset obtained through a testbed implementation in an optical lab. Extensive simulations demonstrate that MODIP outperforms other unsupervised model-based image fusion methods by up to 6 dB in PNSR.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117247"},"PeriodicalIF":3.4,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143148378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AggNet: Learning to aggregate faces for group membership verification AggNet:学习聚合面孔以进行组成员验证
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-12-02 DOI: 10.1016/j.image.2024.117237
Marzieh Gheisari , Javad Amirian , Teddy Furon , Laurent Amsaleg
{"title":"AggNet: Learning to aggregate faces for group membership verification","authors":"Marzieh Gheisari ,&nbsp;Javad Amirian ,&nbsp;Teddy Furon ,&nbsp;Laurent Amsaleg","doi":"10.1016/j.image.2024.117237","DOIUrl":"10.1016/j.image.2024.117237","url":null,"abstract":"<div><div>In certain applications of face recognition, our goal is to verify whether an individual belongs to a particular group while keeping their identity undisclosed. Existing methods have suggested a process of quantizing pre-computed face descriptors into discrete embeddings and aggregating them into a single representation for the group. However, this mechanism is only optimized for a given closed set of individuals and requires relearning the group representations from scratch whenever the groups change. In this paper, we introduce a deep architecture that simultaneously learns face descriptors and the aggregation mechanism to enhance overall performance. Our system can be utilized for new groups comprising individuals who have never been encountered before, and it easily handles new memberships or the termination of existing memberships. Through experiments conducted on multiple extensive, real-world face datasets, we demonstrate that our proposed method achieves superior verification performance compared to other baseline approaches.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117237"},"PeriodicalIF":3.4,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143148377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-granular inter-frame relation exploration and global residual embedding for video-based person re-identification 基于视频的人物再识别多颗粒帧间关系挖掘与全局残差嵌入
IF 3.4 3区 工程技术
Signal Processing-Image Communication Pub Date : 2024-11-30 DOI: 10.1016/j.image.2024.117240
Zhiqin Zhu , Sixin Chen , Guanqiu Qi , Huafeng Li , Xinbo Gao
{"title":"Multi-granular inter-frame relation exploration and global residual embedding for video-based person re-identification","authors":"Zhiqin Zhu ,&nbsp;Sixin Chen ,&nbsp;Guanqiu Qi ,&nbsp;Huafeng Li ,&nbsp;Xinbo Gao","doi":"10.1016/j.image.2024.117240","DOIUrl":"10.1016/j.image.2024.117240","url":null,"abstract":"<div><div>In recent years, the field of video-based person re-identification (re-ID) has conducted in-depth research on how to effectively utilize spatiotemporal clues, which has attracted attention for its potential in providing comprehensive view representations of pedestrians. However, although the discriminability and correlation of spatiotemporal features are often studied, the exploration of the complex relationships between these features has been relatively neglected. Especially when dealing with multi-granularity features, how to depict the different spatial representations of the same person under different perspectives becomes a challenge. To address this challenge, this paper proposes a multi-granularity inter-frame relationship exploration and global residual embedding network specifically designed to solve the above problems. This method successfully extracts more comprehensive and discriminative feature representations by deeply exploring the interactions and global differences between multi-granularity features. Specifically, by simulating the dynamic relationship of different granularity features in long video sequences and using a structured perceptual adjacency matrix to synthesize spatiotemporal information, cross-granularity information is effectively integrated into individual features. In addition, by introducing a residual learning mechanism, this method can also guide the diversified development of global features and reduce the negative impacts caused by factors such as occlusion. Experimental results verify the effectiveness of this method on three mainstream benchmark datasets, significantly surpassing state-of-the-art solutions. This shows that this paper successfully solves the challenging problem of how to accurately identify and utilize the complex relationships between multi-granularity spatiotemporal features in video-based person re-ID.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117240"},"PeriodicalIF":3.4,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143148374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信