International Journal of Computer Vision最新文献

筛选
英文 中文
DI-Retinex: Digital-Imaging Retinex Model for Low-Light Image Enhancement DI-Retinex:用于微光图像增强的数字成像Retinex模型
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-09-06 DOI: 10.1007/s11263-025-02542-z
Shangquan Sun, Wenqi Ren, Jingyang Peng, Fenglong Song, Xiaochun Cao
{"title":"DI-Retinex: Digital-Imaging Retinex Model for Low-Light Image Enhancement","authors":"Shangquan Sun, Wenqi Ren, Jingyang Peng, Fenglong Song, Xiaochun Cao","doi":"10.1007/s11263-025-02542-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02542-z","url":null,"abstract":"<p>Many existing methods for low-light image enhancement (LLIE) based on Retinex model ignore important factors that affect the validity of this model in digital imaging, such as noise, quantization error, non-linearity, and dynamic range overflow. In this paper, we propose a new expression called Digital-Imaging Retinex model (DI-Retinex) through theoretical and experimental analysis of Retinex model in digital imaging. Our new expression includes an offset term in the enhancement model, which allows for pixel-wise brightness contrast adjustment with a non-linear mapping function. In addition, to solve the low-light enhancement problem in an unsupervised manner, we propose an image-adaptive masked degradation loss in Gamma space. We also design a variance suppression loss for regulating the additional offset term. Extensive experiments show that our proposed method outperforms all existing unsupervised methods in terms of visual quality, model size, and speed. Our algorithm can also assist downstream face detectors in low-light, as it shows the most performance gain after the low-light enhancement compared to other methods. We have released our code and model weights on https://github.com/sunshangquan/Di-Retinex.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"42 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145007129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameterized Low-Rank Regularizer for High-dimensional Visual Data 高维视觉数据的参数化低秩正则化
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-09-04 DOI: 10.1007/s11263-025-02569-2
Shuang Xu, Zixiang Zhao, Xiangyong Cao, Jiangjun Peng, Xi-Le Zhao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc Van Gool
{"title":"Parameterized Low-Rank Regularizer for High-dimensional Visual Data","authors":"Shuang Xu, Zixiang Zhao, Xiangyong Cao, Jiangjun Peng, Xi-Le Zhao, Deyu Meng, Yulun Zhang, Radu Timofte, Luc Van Gool","doi":"10.1007/s11263-025-02569-2","DOIUrl":"https://doi.org/10.1007/s11263-025-02569-2","url":null,"abstract":"<p>Factorization models and nuclear norms, two prominent methods for characterizing the low-rank prior, encounter challenges in accurately retrieving low-rank data under severe degradation and lack generalization capabilities. To mitigate these limitations, we propose a Parameterized Low-Rank Regularizer (PLRR), which models low-rank visual data through matrix factorization by utilizing neural networks to parameterize the factor matrices, whose feasible domains are essentially constrained. This approach can be interpreted as imposing an automatically learned penalty on factor matrices. More significantly, the knowledge encoded in network parameters enhances generalization. As a versatile low-rank modeling tool, PLRR exhibits superior performance in various inverse problems, including video foreground extraction, hyperspectral image (HSI) denoising, HSI inpainting, multi-temporal multispectral image (MSI) decloud, and MSI guided blind HSI super-resolution. More significantly, PLRR demonstrates robust generalization capabilities for images with diverse degradations, temporal variations, and scene contexts.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"22 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144995789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EdgeSAM: Prompt-In-the-Loop Distillation for SAM EdgeSAM: SAM的即时循环蒸馏
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-09-02 DOI: 10.1007/s11263-025-02562-9
Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai
{"title":"EdgeSAM: Prompt-In-the-Loop Distillation for SAM","authors":"Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai","doi":"10.1007/s11263-025-02562-9","DOIUrl":"https://doi.org/10.1007/s11263-025-02562-9","url":null,"abstract":"<p>This paper presents EdgeSAM, an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance. Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture, better suited for edge devices. We carefully benchmark various distillation strategies and demonstrate that task-agnostic encoder distillation fails to capture the full knowledge embodied in SAM.To overcome this bottleneck, we include both the prompt encoder and mask decoder in the distillation process, with box and point prompts in the loop, so that the distilled model can accurately capture the intricate dynamics between user input and mask generation. To mitigate dataset bias issues stemming from point prompt distillation, we incorporate a lightweight module within the encoder.As a result, EdgeSAM achieves a 37-fold speed increase compared to the original SAM, and it also outperforms MobileSAM/EfficientSAM, being over 7 times as fast when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3/1.5 and 3.1/1.6, respectively. It is also the first SAM variant that can run at over 30 FPS on an iPhone 14. Code and demo are available here https://mmlab-ntu.github.io/project/edgesam/.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"28 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Curvature Learning for Generalization of Hyperbolic Neural Networks 双曲神经网络泛化的曲率学习
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-09-01 DOI: 10.1007/s11263-025-02567-4
Xiaomeng Fan, Yuwei Wu, Zhi Gao, Mehrtash Harandi, Yunde Jia
{"title":"Curvature Learning for Generalization of Hyperbolic Neural Networks","authors":"Xiaomeng Fan, Yuwei Wu, Zhi Gao, Mehrtash Harandi, Yunde Jia","doi":"10.1007/s11263-025-02567-4","DOIUrl":"https://doi.org/10.1007/s11263-025-02567-4","url":null,"abstract":"<p>Hyperbolic neural networks (HNNs) have demonstrated notable efficacy in representing real-world data with hierarchical structures via exploiting the geometric properties of hyperbolic spaces characterized by negative curvatures. Curvature plays a crucial role in optimizing HNNs. Inappropriate curvatures may cause HNNs to converge to suboptimal parameters, degrading overall performance. So far, the theoretical foundation of the effect of curvatures on HNNs has not been developed. In this paper, we derive a PAC-Bayesian generalization bound of HNNs, highlighting the role of curvatures in the generalization of HNNs via their effect on the smoothness of the loss landscape. Driven by the derived bound, we propose a sharpness-aware curvature learning method to smooth the loss landscape, thereby improving the generalization of HNNs. In our method, we design a scope sharpness measure for curvatures, which is minimized through a bi-level optimization process. Then, we introduce an implicit differentiation algorithm that efficiently solves the bi-level optimization by approximating gradients of curvatures. We present the approximation error and convergence analyses of the proposed method, showing that the approximation error is upper-bounded, and the proposed method can converge by bounding gradients of HNNs. Experiments on four settings: classification, learning from long-tailed data, learning from noisy data, and few-shot learning show that our method can improve the performance of HNNs.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"16 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive Display for Teleoperation Based on Vector Fields Using Lidar-Camera Fusion 基于激光雷达-相机融合的矢量场远程操作预测显示
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-08-31 DOI: 10.1007/s11263-025-02550-z
Gaurav Sharma, Jeff Calder, Rajesh Rajamani
{"title":"Predictive Display for Teleoperation Based on Vector Fields Using Lidar-Camera Fusion","authors":"Gaurav Sharma, Jeff Calder, Rajesh Rajamani","doi":"10.1007/s11263-025-02550-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02550-z","url":null,"abstract":"<p>Teleoperation can enable human intervention to help handle instances of failure in autonomy thus allowing for much safer deployment of autonomous vehicle technology. Successful teleoperation requires recreating the environment around the remote vehicle using camera data received over wireless communication channels. This paper develops a new predictive display system to tackle the significant time delays encountered in receiving camera data over wireless networks. First, a new high gain observer is developed for estimating the position and orientation of the ego vehicle. The novel observer is shown to perform accurate state estimation using only GNSS and gyroscope sensor readings. A vector field method which fuses the delayed camera and Lidar data is then presented. This method uses sparse 3D points obtained from Lidar and transforms them using the state estimates from the high gain observer to generate a sparse vector field for the camera image. Polynomial based interpolation is then performed to obtain the vector field for the complete image which is then remapped to synthesize images for accurate predictive display. The method is evaluated on real-world experimental data from the nuScenes and KITTI datasets. The performance of the high gain observer is also evaluated and compared with that of the EKF. The synthesized images using the vector field based predictive display are compared with ground truth images using various image metrics and offer vastly improved performance compared to delayed images.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"162 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMPL-IKS: A Mixed Analytical-Neural Inverse Kinematics Solver for 3D Human Mesh Recovery SMPL-IKS:一种用于三维人体网格恢复的混合分析-神经逆运动学求解器
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-08-30 DOI: 10.1007/s11263-025-02574-5
Zijian Zhang, Muqing Wu, Honghao Qi, Tianyi Ma, Min Zhao
{"title":"SMPL-IKS: A Mixed Analytical-Neural Inverse Kinematics Solver for 3D Human Mesh Recovery","authors":"Zijian Zhang, Muqing Wu, Honghao Qi, Tianyi Ma, Min Zhao","doi":"10.1007/s11263-025-02574-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02574-5","url":null,"abstract":"<p>We present SMPL-IKS, a mixed analytical-neural inverse kinematics solver that operates on the well-known Skinned Multi-Person Linear model (SMPL) to recover human mesh from 3D skeleton. The key challenges in the task are threefold: (1) Shape Mismatching, (2) Error Accumulation, and (3) Rotation Ambiguity. Unlike previous methods that rely on costly vertex up-sampling or iterative optimization, SMPL-IKS directly regresses the SMPL parameters (<i>i.e.</i>, shape and pose parameters) in a clean and efficient way. Specifically, we propose to infer <i>skeleton-to-mesh</i> via three explicit mappings viz. <i>Shape Inverse (SI)</i>, <i>Inverse kinematics (IK)</i>, and <i>Pose Refinement (PR)</i>. SI maps bone length to shape parameters, IK maps bone direction to pose parameters, and PR addresses errors accumulated along the kinematic tree. SMPL-IKS is general and thus extensible to MANO or SMPL-H models. Extensive experiments are conducted on various benchmarks for body-only, hand-only, and body-hand scenarios. Our model surpasses state-of-the-art methods by a large margin while being much more efficient. Data and code are available at https://github.com/Z-Z-J/SMPL-IKS.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"31 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible Camera Calibration using a Collimator System 使用准直器系统的灵活相机校准
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-08-29 DOI: 10.1007/s11263-025-02576-3
Shunkun Liang, Banglei Guan, Zhenbao Yu, Dongcai Tan, Pengju Sun, Zibin Liu, Qifeng Yu, Yang Shang
{"title":"Flexible Camera Calibration using a Collimator System","authors":"Shunkun Liang, Banglei Guan, Zhenbao Yu, Dongcai Tan, Pengju Sun, Zibin Liu, Qifeng Yu, Yang Shang","doi":"10.1007/s11263-025-02576-3","DOIUrl":"https://doi.org/10.1007/s11263-025-02576-3","url":null,"abstract":"<p>Camera calibration is a crucial step in photogrammetry and 3D vision applications. This paper introduces a novel camera calibration method using a designed collimator system. Our collimator system provides a reliable and controllable calibration environment for the camera. Exploiting the unique optical geometry property of our collimator system, we introduce an angle invariance constraint and further prove that the relative motion between the calibration target and camera conforms to a spherical motion model. This constraint reduces the original 6DOF relative motion between target and camera to a 3DOF pure rotation motion. Using spherical motion constraint, a closed-form linear solver for multiple images and a minimal solver for two images are proposed for camera calibration. Furthermore, we propose a single collimator image calibration algorithm based on the angle invariance constraint. This algorithm eliminates the requirement for camera motion, providing a novel solution for flexible and fast calibration. The performance of our method is evaluated in both synthetic and real-world experiments, which verify the feasibility of calibration using the collimator system and demonstrate that our method is superior to existing baseline methods. Demo code is available at https://github.com/LiangSK98/CollimatorCalibration.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"32 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depth from Coupled Optical Differentiation 耦合光学微分的深度
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-08-29 DOI: 10.1007/s11263-025-02534-z
Junjie Luo, Yuxuan Liu, Emma Alexander, Qi Guo
{"title":"Depth from Coupled Optical Differentiation","authors":"Junjie Luo, Yuxuan Liu, Emma Alexander, Qi Guo","doi":"10.1007/s11263-025-02534-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02534-z","url":null,"abstract":"<p>We propose depth from coupled optical differentiation, a low-computation passive-lighting 3D sensing mechanism. It is based on our discovery that per-pixel object distance can be rigorously determined by a coupled pair of optical derivatives of a defocused image using a simple, closed-form relationship. Unlike previous depth-from-defocus (DfD) methods that leverage higher-order spatial derivatives of the image to estimate scene depths, the proposed mechanism’s use of only first-order optical derivatives makes it significantly more robust to noise. Furthermore, unlike many previous DfD algorithms with requirements on aperture code, this relationship is proved to be universal to a broad range of aperture codes. We build the first 3D sensor based on depth from coupled optical differentiation. Its optical assembly includes a deformable lens and a motorized iris, which enables dynamic adjustments to the optical power and aperture radius. The sensor captures two pairs of images: one pair with a differential change of optical power and the other with a differential change of aperture scale. From the four images, a depth and confidence map can be generated with only 36 floating point operations per output pixel (FLOPOP), more than ten times lower than the previous lowest passive-lighting depth sensing solution to our knowledge. Additionally, the depth map generated by the proposed sensor demonstrates more than twice the working range of previous DfD methods while using significantly lower computation.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"116 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diffusion Models for Image Restoration and Enhancement: A Comprehensive Survey 图像恢复与增强的扩散模型综述
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-08-28 DOI: 10.1007/s11263-025-02570-9
Xin Li, Yulin Ren, Xin Jin, Cuiling Lan, Xingrui Wang, Wenjun Zeng, Xinchao Wang, Zhibo Chen
{"title":"Diffusion Models for Image Restoration and Enhancement: A Comprehensive Survey","authors":"Xin Li, Yulin Ren, Xin Jin, Cuiling Lan, Xingrui Wang, Wenjun Zeng, Xinchao Wang, Zhibo Chen","doi":"10.1007/s11263-025-02570-9","DOIUrl":"https://doi.org/10.1007/s11263-025-02570-9","url":null,"abstract":"<p>Image restoration (IR) has been an indispensable and challenging task in the low-level vision field, which strives to improve the subjective quality of images distorted by various forms of degradation. Recently, the diffusion model has achieved significant advancements in the visual generation of AIGC, thereby raising an intuitive question, “whether the diffusion model can boost image restoration\". To answer this, some pioneering studies attempt to integrate diffusion models into the image restoration task, resulting in superior performances than previous GAN-based methods. Despite that, a comprehensive and enlightening survey on diffusion model-based image restoration remains scarce. In this paper, we are the first to present a comprehensive review of recent diffusion model-based methods on image restoration, encompassing the learning paradigm, conditional strategy, framework design, modeling strategy, and evaluation. Concretely, we first introduce the background of the diffusion model briefly and then present two prevalent workflows that exploit diffusion models in image restoration. Subsequently, we classify and emphasize the innovative designs using diffusion models for both IR and blind/real-world IR, intending to inspire future development. To evaluate existing methods thoroughly, we summarize the commonly used dataset, implementation details, and evaluation metrics. Additionally, we present the objective comparison for open-sourced methods across three tasks, including image super-resolution, deblurring, and inpainting. Ultimately, informed by the limitations in existing works, we propose nine potential and challenging directions for the future research of diffusion model-based IR, including sampling efficiency, model compression, distortion simulation and estimation, distortion invariant learning, and framework design. The repository is released at https://github.com/lixinustc/Awesome-diffusion-model-for-image-processing/</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"14 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressing Vision Transformer from the View of Model Property in Frequency Domain 从频域模型特性看压缩视觉变压器
IF 19.5 2区 计算机科学
International Journal of Computer Vision Pub Date : 2025-08-28 DOI: 10.1007/s11263-025-02561-w
Zhenyu Wang, Xuemei Xie, Hao Luo, Tao Huang, Weisheng Dong, Kai Xiong, Yongxu Liu, Xuyang Li, Fan Wang, Guangming Shi
{"title":"Compressing Vision Transformer from the View of Model Property in Frequency Domain","authors":"Zhenyu Wang, Xuemei Xie, Hao Luo, Tao Huang, Weisheng Dong, Kai Xiong, Yongxu Liu, Xuyang Li, Fan Wang, Guangming Shi","doi":"10.1007/s11263-025-02561-w","DOIUrl":"https://doi.org/10.1007/s11263-025-02561-w","url":null,"abstract":"<p>Vision Transformers (ViTs) have recently demonstrated significant potential in computer vision, but their high computational costs remain a challenge. To address this limitation, various methods have been proposed to compress ViTs. Most approaches utilize spatial-domain information and adapt techniques from convolutional neural networks (CNNs) pruning to reduce channels or tokens. However, differences between ViTs and CNNs in the frequency domain make these methods vulnerable to noise in the spatial domain, potentially resulting in erroneous channel or token removal and substantial performance drops. Recent studies suggest that high-frequency signals carry limited information for ViTs, and that the self-attention mechanism functions similarly to a low-pass filter. Inspired by these insights, this paper proposes a joint compression method that leverages properties of ViTs in the frequency domain. Specifically, a metric called <i>L</i>ow-<i>F</i>requency <i>S</i>ensitivity (LFS) is used to accurately identify and compress redundant channels, while a token-merging approach, assisted by <i>L</i>ow-<i>F</i>requency <i>E</i>nergy (LFE), is introduced to reduce tokens. Through joint channel and token compression, the proposed method reduces the FLOPs of ViTs by over 50% with less than a 1% performance drop on ImageNet-1K and achieves approximately a 40% reduction in FLOPs for dense prediction tasks, including object detection and semantic segmentation.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"45 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信