IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

筛选
英文 中文
Pruning Sparse Tensor Neural Networks Enables Deep Learning for 3D Ultrasound Localization Microscopy.
Brice Rauby, Paul Xing, Jonathan Poree, Maxime Gasse, Jean Provost
{"title":"Pruning Sparse Tensor Neural Networks Enables Deep Learning for 3D Ultrasound Localization Microscopy.","authors":"Brice Rauby, Paul Xing, Jonathan Poree, Maxime Gasse, Jean Provost","doi":"10.1109/TIP.2025.3552198","DOIUrl":"10.1109/TIP.2025.3552198","url":null,"abstract":"<p><p>Ultrasound Localization Microscopy (ULM) is a non-invasive technique that allows for the imaging of micro-vessels in vivo, at depth and with a resolution on the order of ten microns. ULM is based on the sub-resolution localization of individual microbubbles injected in the bloodstream. Mapping the whole angioarchitecture requires the accumulation of microbubbles trajectories from thousands of frames, typically acquired over a few minutes. ULM acquisition times can be reduced by increasing the microbubble concentration, but requires more advanced algorithms to detect them individually. Several deep learning approaches have been proposed for this task, but they remain limited to 2D imaging, in part due to the associated large memory requirements. Herein, we propose the use of sparse tensor neural networks to enable deep learning-based 3D ULM by improving memory scalability with increased dimensionality. We study several approaches to efficiently convert ultrasound data into a sparse format and study the impact of the associated loss of information. When applied in 2D, the sparse formulation reduces the memory requirements by a factor 2 at the cost of a small reduction of performance when compared against dense networks. In 3D, the proposed approach reduces memory requirements by two order of magnitude while largely outperforming conventional ULM in high concentration settings. We show that Sparse Tensor Neural Networks in 3D ULM allow for the same benefits as dense deep learning based method in 2D ULM i.e. the use of higher concentration in silico and reduced acquisition time.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143702208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diffusion-Based Facial Aesthetics Enhancement With 3D Structure Guidance
Lisha Li;Jingwen Hou;Weide Liu;Yuming Fang;Jiebin Yan
{"title":"Diffusion-Based Facial Aesthetics Enhancement With 3D Structure Guidance","authors":"Lisha Li;Jingwen Hou;Weide Liu;Yuming Fang;Jiebin Yan","doi":"10.1109/TIP.2025.3551077","DOIUrl":"10.1109/TIP.2025.3551077","url":null,"abstract":"Facial Aesthetics Enhancement (FAE) aims to improve facial attractiveness by adjusting the structure and appearance of a facial image while preserving its identity as much as possible. Most existing methods adopted deep feature-based or score-based guidance for generation models to conduct FAE. Although these methods achieved promising results, they potentially produced excessively beautified results with lower identity consistency or insufficiently improved facial attractiveness. To enhance facial aesthetics with less loss of identity, we propose the Nearest Neighbor Structure Guidance based on Diffusion (NNSG-Diffusion), a diffusion-based FAE method that beautifies a 2D facial image with 3D structure guidance. Specifically, we propose to extract FAE guidance from a nearest neighbor reference face. To allow for less change of facial structures in the FAE process, a 3D face model is recovered by referring to both the matched 2D reference face and the 2D input face, so that the depth and contour guidance can be extracted from the 3D face model. Then the depth and contour clues can provide effective guidance to Stable Diffusion with ControlNet for FAE. Extensive experiments demonstrate that our method is superior to previous relevant methods in enhancing facial aesthetics while preserving facial identity.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1879-1894"},"PeriodicalIF":0.0,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143672286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DGC-Net: Dynamic Graph Contrastive Network for Video Object Detection
Qiang Qi;Hanzi Wang;Yan Yan;Xuelong Li
{"title":"DGC-Net: Dynamic Graph Contrastive Network for Video Object Detection","authors":"Qiang Qi;Hanzi Wang;Yan Yan;Xuelong Li","doi":"10.1109/TIP.2025.3551158","DOIUrl":"10.1109/TIP.2025.3551158","url":null,"abstract":"Video object detection is a challenging task in computer vision since it needs to handle the object appearance degradation problem that seldom occurs in the image domain. Off-the-shelf video object detection methods typically aggregate multi-frame features at one stroke to alleviate appearance degradation. However, these existing methods do not take supervision knowledge into consideration and thus still suffer from insufficient feature aggregation, resulting in the false detection problem. In this paper, we take a different perspective on feature aggregation, and propose a dynamic graph contrastive network (DGC-Net) for video object detection, including three improvements against existing methods. First, we design a frame-level graph contrastive module to aggregate frame features, enabling our DGC-Net to fully exploit discriminative contextual feature representations to facilitate video object detection. Second, we develop a proposal-level graph contrastive module to aggregate proposal features, making our DGC-Net sufficiently learn discriminative semantic feature representations. Third, we present a graph transformer to dynamically adjust the graph structure by pruning the useless nodes and edges, which contributes to improving accuracy and efficiency as it can eliminate the geometric-semantic ambiguity and reduce the graph scale. Furthermore, inherited from the framework of DGC-Net, we develop DGC-Net Lite to perform real-time video object detection with a much faster inference speed. Extensive experiments conducted on the ImageNet VID dataset demonstrate that our DGC-Net outperforms the performance of current state-of-the-art methods. Notably, our DGC-Net obtains 86.3%/87.3% mAP when using ResNet-101/ResNeXt-101.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2269-2284"},"PeriodicalIF":0.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Per-Pixel Calibration Based on Multi-View 3D Reconstruction Errors Beyond the Depth of Field
Rong Dai;Wenpan Li;Yun-Hui Liu
{"title":"Per-Pixel Calibration Based on Multi-View 3D Reconstruction Errors Beyond the Depth of Field","authors":"Rong Dai;Wenpan Li;Yun-Hui Liu","doi":"10.1109/TIP.2025.3551165","DOIUrl":"10.1109/TIP.2025.3551165","url":null,"abstract":"In 3D microscopic imaging, the extremely shallow depth of field presents a challenge for accurate 3D reconstruction in cases of significant defocus. Traditional calibration methods rely on the spatial extraction of feature points to establish spatial 3D information as the optimization objective. However, these methods suffer from reduced extraction accuracy under defocus conditions, which causes degradation of calibration performance. To extend calibration volume without compromising accuracy in defocused scenarios, we propose a per-pixel calibration based on multi-view 3D reconstruction errors. It utilizes 3D reconstruction errors among different binocular setups as an optimization objective. We first analyze multi-view 3D reconstruction error distributions under the poor-accuracy optical model by employing a multi-view microscopic 3D measurement system using telecentric lenses. Subsequently, the 3D proportion model is proposed for implementing our error-based per-pixel calibration, derived as a spatial linear expression directly correlated with the 3D reconstruction error distribution. The experimental results confirm the robust convergence of our method with multiple binocular setups. Near the focus volume, the multi-view 3D reconstruction error remains approximately <inline-formula> <tex-math>$8~mu $ </tex-math></inline-formula> m (less than 0.5 camera pixel pitch), with absolute accuracy maintained within 0.5% of the measurement range. Beyond tenfold depth of field, the multi-view 3D reconstruction error increases to around <inline-formula> <tex-math>$30~mu $ </tex-math></inline-formula> m (still less than 2 camera pixel pitches), while absolute accuracy remains within 1% of the measurement range. These high-precision measurement results validate the feasibility and accuracy of our proposed calibration.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2124-2132"},"PeriodicalIF":0.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geodesic-Aligned Gradient Projection for Continual Task Learning
Benliu Qiu;Heqian Qiu;Haitao Wen;Lanxiao Wang;Yu Dai;Fanman Meng;Qingbo Wu;Hongliang Li
{"title":"Geodesic-Aligned Gradient Projection for Continual Task Learning","authors":"Benliu Qiu;Heqian Qiu;Haitao Wen;Lanxiao Wang;Yu Dai;Fanman Meng;Qingbo Wu;Hongliang Li","doi":"10.1109/TIP.2025.3551139","DOIUrl":"10.1109/TIP.2025.3551139","url":null,"abstract":"Deep networks notoriously suffer from performance deterioration on previous tasks when learning from sequential tasks, i.e., catastrophic forgetting. Recent methods of gradient projection show that the forgetting is resulted from the gradient interference on old tasks and accordingly propose to update the network in an orthogonal direction to the task space. However, these methods assume the task space is invariant and neglect the gradual change between tasks, resulting in sub-optimal gradient projection and a compromise of the continual learning capacity. To tackle this problem, we propose to embed each task subspace into a non-Euclidean manifold, which can naturally capture the change of tasks since the manifold is intrinsically non-static compared to the Euclidean space. Subsequently, we analytically derive the accumulated projection between any two subspaces on the manifold along the geodesic path by integrating an infinite number of intermediate subspaces. Building upon this derivation, we propose a novel geodesic-aligned gradient projection (GAGP) method that harnesses the accumulated projection to mitigate catastrophic forgetting. The proposed method utilizes the geometric structure information on the task manifold by capturing the gradual change between the new and the old tasks. Empirical studies on image classification demonstrate that the proposed method alleviates catastrophic forgetting and achieves on-par or better performance compared to the state-of-the-art approaches.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1995-2007"},"PeriodicalIF":0.0,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual Quality Assessment of Composite Images: A Compression-Oriented Database and Measurement
Miaohui Wang;Zhuowei Xu;Xiaofang Zhang;Yuming Fang;Weisi Lin
{"title":"Visual Quality Assessment of Composite Images: A Compression-Oriented Database and Measurement","authors":"Miaohui Wang;Zhuowei Xu;Xiaofang Zhang;Yuming Fang;Weisi Lin","doi":"10.1109/TIP.2025.3550005","DOIUrl":"10.1109/TIP.2025.3550005","url":null,"abstract":"Composite images (CIs) have experienced unprecedented growth, especially with the prosperity of a large number of generative AI technologies. They are usually created by combining multiple visual elements from different sources to form a single cohesive composition, which have an increasing impact on a variety of vision applications. However, transmission of CIs can degrade their visual quality, especially undergoing lossy compression to reduce bandwidth and storage. To facilitate the development of objective measurements for CIs and investigate the influence of compression distortions on their perception, we establish a compression-oriented image quality assessment (CIQA) database for CIs (called ciCIQA) with 30 typical encoding distortions. Compressed with six representative codecs, we have carried out a large-scale subjective experiment that delivered 3,000 encoded CIs with labeled quality scores, making ciCIQA one of the earliest CI databases with the most compression types. ciCIQA enables us to explore the encoding effects on visual quality from the first five just noticeable difference (JND) points, offering insights for perceptual CI compression and related tasks. Moreover, we have proposed a new multi-masked no-reference CIQA method(called mmCIQA), including a multi-masked quality representation module, a self-supervised quality alignment module, and a multi-masked attentive fusion module. Experimental results demonstrate the outstanding performance of our mmCIQA in assessing the quality of CIs, outperforming 17 competitive approaches. The proposed method and database as well as the collected objective metrics are made publicly available on <uri>https://charwill.github.io/mmciqa.html</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1849-1863"},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143660086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HSLabeling: Toward Efficient Labeling for Large-Scale Remote Sensing Image Segmentation With Hybrid Sparse Labeling
Jiaxing Lin;Zhen Yang;Qiang Liu;Yinglong Yan;Pedram Ghamisi;Weiying Xie;Leyuan Fang
{"title":"HSLabeling: Toward Efficient Labeling for Large-Scale Remote Sensing Image Segmentation With Hybrid Sparse Labeling","authors":"Jiaxing Lin;Zhen Yang;Qiang Liu;Yinglong Yan;Pedram Ghamisi;Weiying Xie;Leyuan Fang","doi":"10.1109/TIP.2025.3550039","DOIUrl":"10.1109/TIP.2025.3550039","url":null,"abstract":"Dense pixel-wise labeling of large-scale remote sensing images (RSI) is very time-consuming, while sparse labels (i.e., points, scribbles, or blocks) can be an efficient way to reduce labeling costs. Most existing sparse label-based methods adopt only one type of label for image segmentation, which cannot reflect the complex land covers in the RSI for training the model, thus leading to inferior segmentation performance. We observe that land covers with different shapes and complexity can be optimally represented by different sparse labels. Inspired by this observation, we propose a novel sparse labeling framework, termed Hybrid Sparse Labeling (HSLabeling), for large-scale RSI segmentation. Our HSLabeling can adaptively select the optimal hybrid sparse labels for different land covers, according to labeling cost and segmentation contribution of different sparse labels. Specifically, we first propose a label segmentation contribution information estimation module that estimates the information of different sparse labels according to the diversity and shape of land covers. After that, we propose an Optimal Hybrid Labeling Strategy (OHLS) to assign optimal types of labels for different land covers. In the OHLS, label assignment is formulated as an optimization problem that trades off label segmentation contribution information and labeling cost. We employ the greedy algorithm to efficiently solve the optimization problem and adaptively assign labels for varied land covers. Extensive experiments on three large-scale RSI datasets have demonstrated that our HSLabeling achieves almost fully supervised performance with extremely low labeling costs. In addition, compared with the single type sparse label, HSLabeling can also utilize much lower labeling costs to obtain the same performance. The source code is available at <uri>https://github.com/linjiaxing99/HSLabeling</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1864-1878"},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143660084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zerotree Coding of Subdivision Wavelet Coefficients in Dynamic Time-Varying Meshes
Maja Krivokuća;Tomás M. Borges;Ricardo L. de Queiroz
{"title":"Zerotree Coding of Subdivision Wavelet Coefficients in Dynamic Time-Varying Meshes","authors":"Maja Krivokuća;Tomás M. Borges;Ricardo L. de Queiroz","doi":"10.1109/TIP.2025.3549998","DOIUrl":"10.1109/TIP.2025.3549998","url":null,"abstract":"We propose a complete system to enable progressive coding with quality scalability of the mesh geometry, in MPEG’s state-of-the-art Video-based Dynamic Mesh Coding (V-DMC) framework. In particular, we propose an alternative method for encoding the subdivision wavelet coefficients in V-DMC, using a zerotree coding approach that works directly in the native 3D mesh space. This allows us to identify parent-child relationships amongst the wavelet coefficients across different subdivision levels, which can be used to achieve an efficient and versatile coding mechanism. We demonstrate that, given a starting base mesh, a target subdivision surface and a desired maximum number of zerotree passes, our system produces an elegant and visually attractive lossy-to-lossless mesh geometry reconstruction with no further user intervention. Moreover, lossless coefficient encoding with our approach requires nearly the same bitrate as the default displacement coding methods in V-DMC. Yet, our approach provides several quality resolution levels embedded in the same bitstream, while the current V-DMC solutions encode a single quality level only. To the best of our knowledge, this is the first time that a zerotree-based method has been proposed and demonstrated to work for the compression of dynamic time-varying meshes, and the first time that an embedded quality-scalable approach has been used in the V-DMC framework.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1810-1819"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equivariant Local Reference Frames With Optimization for Robust Non-Rigid Point Cloud Correspondence
Ling Wang;Runfa Chen;Fuchun Sun;Xinzhou Wang;Kai Sun;Chengliang Zhong;Guangyuan Fu;Yikai Wang
{"title":"Equivariant Local Reference Frames With Optimization for Robust Non-Rigid Point Cloud Correspondence","authors":"Ling Wang;Runfa Chen;Fuchun Sun;Xinzhou Wang;Kai Sun;Chengliang Zhong;Guangyuan Fu;Yikai Wang","doi":"10.1109/TIP.2025.3550006","DOIUrl":"10.1109/TIP.2025.3550006","url":null,"abstract":"Unsupervised non-rigid point cloud shape correspondence underpins a multitude of 3D vision tasks, yet itself is non-trivial given the exponential complexity stemming from inter-point degree-of-freedom, i.e., pose transformations. Based on the assumption of local rigidity, one solution for reducing complexity is to decompose the overall shape into independent local regions using Local Reference Frames (LRFs) that are equivariant to SE(3) transformations. However, the focus solely on local structure neglects global geometric contexts, resulting in less distinctive LRFs that lack crucial semantic information necessary for effective matching. Furthermore, such complexity introduces out-of-distribution geometric contexts during inference, thus complicating generalization. To this end, we introduce 1) <sc>EquiShape</small>, a novel structure tailored to learn pair-wise LRFs with global structural cues for both spatial and semantic consistency, and 2) LRF-Refine, an optimization strategy generally applicable to LRF-based methods, aimed at addressing the generalization challenges. Specifically, for <sc>EquiShape</small>, we employ cross-talk within separate equivariant graph neural networks (Cross-GVP) to build long-range dependencies to compensate for the lack of semantic information in local structure modeling, deducing pair-wise independent SE(3)-equivariant LRF vectors for each point. For LRF-Refine, the optimization adjusts LRFs within specific contexts and knowledge, enhancing the geometric and semantic generalizability of point features. Our overall framework surpasses the state-of-the-art methods by a large margin on three benchmarks. Codes are available at <uri>https://github.com/2019EPWL/EquiShape</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1980-1994"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event-Based Video Reconstruction With Deep Spatial-Frequency Unfolding Network
Chengjie Ge;Xueyang Fu;Kunyu Wang;Zheng-Jun Zha
{"title":"Event-Based Video Reconstruction With Deep Spatial-Frequency Unfolding Network","authors":"Chengjie Ge;Xueyang Fu;Kunyu Wang;Zheng-Jun Zha","doi":"10.1109/TIP.2025.3550008","DOIUrl":"10.1109/TIP.2025.3550008","url":null,"abstract":"Current event-based video reconstruction methods, limited to the spatial domain, face challenges in decoupling brightness and structural information, leading to exposure distortion, and in efficiently acquiring non-local information without relying on computationally expensive Transformer models. To address these issues, we propose the Deep Spatial-Frequency Unfolding Reconstruction Network (DSFURNet), which explores and utilizes knowledge in the frequency domain for event-based video reconstruction. Specifically, we construct a variational model and propose three regularization terms: a brightness regularization term approximated by Fourier amplitudes, a structural regularization term approximated by Fourier phases, and an initialization regularization term that converts event representations into initial video frames. Then, we design corresponding spatial-frequency domain approximation operators for each regularization term. Benefiting from the global nature of computations in the frequency domain, the designed approximation operators can integrate local spatial and global frequency information at a lower computational cost. Furthermore, we combine the learned knowledge of the three regularization terms and unfold the optimization algorithm into an iterative deep network. Through this approach, the pixel-level initialization regularization constraint and the frequency domain brightness and structural regularization constraints can continuously play a role during the testing process, achieving a gradual improvement in the quality of the reconstructed video frames. Compared to existing methods, our network significantly reduces the number of network parameters while improving evaluation metrics.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1779-1794"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143640782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信