Pattern Recognition最新文献

筛选
英文 中文
AdvCloak: Customized adversarial cloak for privacy protection AdvCloak:用于隐私保护的定制对抗性斗篷
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111050
Xuannan Liu, Yaoyao Zhong, Xing Cui, Yuhang Zhang, Peipei Li, Weihong Deng
{"title":"AdvCloak: Customized adversarial cloak for privacy protection","authors":"Xuannan Liu,&nbsp;Yaoyao Zhong,&nbsp;Xing Cui,&nbsp;Yuhang Zhang,&nbsp;Peipei Li,&nbsp;Weihong Deng","doi":"10.1016/j.patcog.2024.111050","DOIUrl":"10.1016/j.patcog.2024.111050","url":null,"abstract":"<div><div>With extensive face images being shared on social media, there has been a notable escalation in privacy concerns. In this paper, we propose AdvCloak, an innovative framework for privacy protection using generative models. AdvCloak is designed to automatically customize class-wise adversarial masks that can maintain superior image-level naturalness while providing enhanced feature-level generalization ability. Specifically, AdvCloak sequentially optimizes the generative adversarial networks by employing a two-stage training strategy. This strategy initially focuses on adapting the masks to the unique individual faces and then enhances their feature-level generalization ability to diverse facial variations of individuals. To fully utilize the limited training data, we combine AdvCloak with several general geometric modeling methods, to better describe the feature subspace of source identities. Extensive quantitative and qualitative evaluations on both common and celebrity datasets demonstrate that AdvCloak outperforms existing state-of-the-art methods in terms of efficiency and effectiveness. The code is available at <span><span>https://github.com/liuxuannan/AdvCloak</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111050"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CHA: Conditional Hyper-Adapter method for detecting human–object interaction CHA: 检测人与物体互动的条件超适配器方法
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111075
Mengyang Sun , Wei Suo , Ji Wang , Peng Wang , Yanning Zhang
{"title":"CHA: Conditional Hyper-Adapter method for detecting human–object interaction","authors":"Mengyang Sun ,&nbsp;Wei Suo ,&nbsp;Ji Wang ,&nbsp;Peng Wang ,&nbsp;Yanning Zhang","doi":"10.1016/j.patcog.2024.111075","DOIUrl":"10.1016/j.patcog.2024.111075","url":null,"abstract":"<div><div>Human–object interactions (HOI) detection aims at capturing human–object pairs in images and predicting their actions. It is an essential step for many visual reasoning tasks, such as VQA, image retrieval and surveillance event detection. The challenge of this task is to tackle the compositional learning problem, especially in a few-shot setting. A straightforward approach is designing a group of dedicated models for each specific pair. However, the maintenance of these independent models is unrealistic due to combinatorial explosion. To address the above problems, we propose a new Conditional Hyper-Adapter (CHA) method based on meta-learning. Different from previous works, our approach regards each <span><math><mo>&lt;</mo></math></span>verb, object<span><math><mo>&gt;</mo></math></span> as an independent sub-task. Meanwhile, we design two kinds of Hyper-Adapter structures to guide the model to learn “how to address the HOI detection”. By combining the different conditions and hypernetwork, the CHA can adaptively generate partial parameters and improve the representation and generalization ability of the model. Finally, our proposed method can be viewed as a plug-and-play module to boost existing HOI detection models on the widely used HOI benchmarks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111075"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-aware frame-event fusion based pattern recognition via large vision–language models 通过大型视觉语言模型进行基于语义感知的帧-事件融合模式识别
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111080
Dong Li , Jiandong Jin , Yuhao Zhang , Yanlin Zhong , Yaoyang Wu , Lan Chen , Xiao Wang , Bin Luo
{"title":"Semantic-aware frame-event fusion based pattern recognition via large vision–language models","authors":"Dong Li ,&nbsp;Jiandong Jin ,&nbsp;Yuhao Zhang ,&nbsp;Yanlin Zhong ,&nbsp;Yaoyang Wu ,&nbsp;Lan Chen ,&nbsp;Xiao Wang ,&nbsp;Bin Luo","doi":"10.1016/j.patcog.2024.111080","DOIUrl":"10.1016/j.patcog.2024.111080","url":null,"abstract":"<div><div>Pattern recognition through the fusion of RGB frames and Event streams has emerged as a novel research area in recent years. Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition. However, we posit that these methods may suffer from two key issues: (1). They attempt to directly learn a mapping from the input vision modality to the semantic labels. This approach often leads to sub-optimal results due to the disparity between the input and semantic labels; (2). They utilize small-scale backbone networks for the extraction of RGB and Event input features, thus these models fail to harness the recent performance advancements of large-scale visual-language models. In this study, we introduce a novel pattern recognition framework that consolidates the semantic labels, RGB frames, and event streams, leveraging pre-trained large-scale vision–language models. Specifically, given the input RGB frames, event streams, and all the predefined semantic labels, we employ a pre-trained large-scale vision model (CLIP vision encoder) to extract the RGB and event features. To handle the semantic labels, we initially convert them into language descriptions through prompt engineering and polish using ChatGPT, and then obtain the semantic features using the pre-trained large-scale language model (CLIP text encoder). Subsequently, we integrate the RGB/Event features and semantic features using multimodal Transformer networks. The resulting frame and event tokens are further amplified using self-attention layers. Concurrently, we propose to enhance the interactions between text tokens and RGB/Event tokens via cross-attention. Finally, we consolidate all three modalities using self-attention and feed-forward layers for recognition. Comprehensive experiments on the HARDVS and PokerEvent datasets fully substantiate the efficacy of our proposed SAFE model. The source code has been released at <span><span>https://github.com/Event-AHU/SAFE_LargeVLM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111080"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perspective-assisted prototype-based learning for semi-supervised crowd counting 基于视角辅助原型的半监督式人群计数学习
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111073
Yifei Qian , Liangfei Zhang , Zhongliang Guo , Xiaopeng Hong , Ognjen Arandjelović , Carl R. Donovan
{"title":"Perspective-assisted prototype-based learning for semi-supervised crowd counting","authors":"Yifei Qian ,&nbsp;Liangfei Zhang ,&nbsp;Zhongliang Guo ,&nbsp;Xiaopeng Hong ,&nbsp;Ognjen Arandjelović ,&nbsp;Carl R. Donovan","doi":"10.1016/j.patcog.2024.111073","DOIUrl":"10.1016/j.patcog.2024.111073","url":null,"abstract":"<div><div>To alleviate the burden of labeling data to train crowd counting models, we propose a prototype-based learning approach for semi-supervised crowd counting with an embeded understanding of perspective. Our key idea is that image patches with the same density of people are likely to exhibit coherent appearance changes under similar perspective distortion, but differ significantly under varying distortions. Motivated by this observation, we construct multiple prototypes for each density level to capture variations in perspective. For labeled data, the prototype-based learning assists the regression task by regularizing the feature space and modeling the relationships within and across different density levels. For unlabeled data, the learnt perspective-embedded prototypes enhance differentiation between samples of the same density levels, allowing for a more nuanced assessment of the predictions. By incorporating regression results, we categorize unlabeled samples as reliable or unreliable, applying tailored consistency learning strategies to enhance model accuracy and generalization. Since the perspective information is often unavailable, we propose a novel pseudo-label assigner based on perspective self-organization which requires no additional annotations and assigns image regions to distinct spatial density groups, which mainly reflect the differences in average density among regions. Extensive experiments on four crowd counting benchmarks demonstrate the effectiveness of our approach.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111073"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pixel shuffling is all you need: spatially aware convmixer for dense prediction tasks 只需像素洗牌:用于密集预测任务的空间感知卷积混频器
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-09 DOI: 10.1016/j.patcog.2024.111068
Hatem Ibrahem, Ahmed Salem, Hyun-Soo Kang
{"title":"Pixel shuffling is all you need: spatially aware convmixer for dense prediction tasks","authors":"Hatem Ibrahem,&nbsp;Ahmed Salem,&nbsp;Hyun-Soo Kang","doi":"10.1016/j.patcog.2024.111068","DOIUrl":"10.1016/j.patcog.2024.111068","url":null,"abstract":"<div><div>ConvMixer is an extremely simple model that could perform better than the state-of-the-art convolutional-based and vision transformer-based methods thanks to mixing the input image patches using a standard convolution. The global mixing process of the patches is only valid for the classification tasks, but it cannot be used for dense prediction tasks as the spatial information of the image is lost in the mixing process. We propose a more efficient technique for image patching, known as pixel shuffling, as it can preserve spatial information. We downsample the input image using the pixel shuffle downsampling in the same form of image patches so that the ConvMixer can be extended for the dense prediction tasks. This paper proves that pixel shuffle downsampling is more efficient than the standard image patching as it outperforms the original ConvMixer architecture in the CIFAR10 and ImageNet-1k classification tasks. We also suggest spatially-aware ConvMixer architectures based on efficient pixel shuffle downsampling and upsampling operations for semantic segmentation and monocular depth estimation. We performed extensive experiments to test the proposed architectures on several datasets; Pascal VOC2012, Cityscapes, and ADE20k for semantic segmentation, NYU-depthV2, and Cityscapes for depth estimation. We show that SA-ConvMixer is efficient enough to get relatively high accuracy at many tasks in a few training epochs (150<span><math><mo>∼</mo></math></span>400). The proposed SA-ConvMixer could achieve an ImageNet-1K Top-1 classification accuracy of 87.02%, mean intersection over union (mIOU) of 87.1% in the PASCAL VOC2012 semantic segmentation task, and absolute relative error of 0.096 in the NYU depthv2 depth estimation task. The implementation code of the proposed method is available at: <span><span>https://github.com/HatemHosam/SA-ConvMixer/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111068"},"PeriodicalIF":7.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diffusion process with structural changes for subspace clustering 用于子空间聚类的具有结构变化的扩散过程
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-09 DOI: 10.1016/j.patcog.2024.111066
Yanjiao Zhu , Qilin Li , Wanquan Liu , Chuancun Yin
{"title":"Diffusion process with structural changes for subspace clustering","authors":"Yanjiao Zhu ,&nbsp;Qilin Li ,&nbsp;Wanquan Liu ,&nbsp;Chuancun Yin","doi":"10.1016/j.patcog.2024.111066","DOIUrl":"10.1016/j.patcog.2024.111066","url":null,"abstract":"<div><div>Spectral clustering-based methods have gained significant popularity in subspace clustering due to their ability to capture the underlying data structure effectively. Standard spectral clustering focuses on only pairwise relationships between data points, neglecting interactions among high-order neighboring points. Integrating the diffusion process can address this limitation by leveraging a Markov random walk. However, ensuring that diffusion methods capture sufficient information while maintaining stability against noise remains challenging. In this paper, we propose the Diffusion Process with Structural Changes (DPSC) method, a novel affinity learning framework that enhances the robustness of the diffusion process. Our approach broadens the scope of nearest neighbors and leverages the dropout idea to generate random transition matrices. Furthermore, inspired by the structural changes model, we use two transition matrices to optimize the iteration rule. The resulting affinity matrix undergoes self-supervised learning and is subsequently integrated back into the diffusion process for refinement. Notably, the convergence of the proposed DPSC is theoretically proven. Extensive experiments on benchmark datasets demonstrate that the proposed method outperforms existing subspace clustering methods. The code of our proposed DPSC is available at <span><span>https://github.com/zhudafa/DPSC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111066"},"PeriodicalIF":7.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class agnostic and specific consistency learning for weakly-supervised point cloud semantic segmentation 用于弱监督点云语义分割的类无关性和特定一致性学习
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-09 DOI: 10.1016/j.patcog.2024.111067
Junwei Wu , Mingjie Sun , Haotian Xu , Chenru Jiang , Wuwei Ma , Quan Zhang
{"title":"Class agnostic and specific consistency learning for weakly-supervised point cloud semantic segmentation","authors":"Junwei Wu ,&nbsp;Mingjie Sun ,&nbsp;Haotian Xu ,&nbsp;Chenru Jiang ,&nbsp;Wuwei Ma ,&nbsp;Quan Zhang","doi":"10.1016/j.patcog.2024.111067","DOIUrl":"10.1016/j.patcog.2024.111067","url":null,"abstract":"<div><div>This paper focuses on Weakly Supervised 3D Point Cloud Semantic Segmentation (WS3DSS), which involves annotating only a few points while leaving a large number of points unlabeled in the training sample. Existing methods roughly force point-to-point predictions across different augmented versions of inputs close to each other. While this paper introduces a carefully-designed approach for learning class agnostic and specific consistency, based on the teacher–student framework. The proposed class-agnostic consistency learning, to bring the features of student and teacher models closer together, enhances the model robustness by replacing the traditional point-to-point prediction consistency with the group-to-group consistency based on the perturbed local neighboring points’ features. Furthermore, to facilitate learning under class-wise supervisions, we propose a class-specific consistency learning method, pulling the feature of the unlabeled point towards its corresponding class-specific memory bank feature. Such a class of the unlabeled point is determined as the one with the highest probability predicted by the classifier. Extensive experimental results demonstrate that our proposed method surpasses the SOTA method SQN (Huet al., 2022) by 2.5% and 8.3% on S3DIS dataset, and 4.4% and 13.9% on ScanNetV2 dataset, on the 0.1% and 0.01% settings, respectively. Code is available at <span><span>https://github.com/jasonwjw/CASC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111067"},"PeriodicalIF":7.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-perspective multi-instance embedding learning with adaptive density distribution mining 双视角多实例嵌入学习与自适应密度分布挖掘
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-09 DOI: 10.1016/j.patcog.2024.111063
Mei Yang , Tian-Lin Chen , Wei-Zhi Wu , Wen-Xi Zeng , Jing-Yu Zhang , Fan Min
{"title":"Dual-perspective multi-instance embedding learning with adaptive density distribution mining","authors":"Mei Yang ,&nbsp;Tian-Lin Chen ,&nbsp;Wei-Zhi Wu ,&nbsp;Wen-Xi Zeng ,&nbsp;Jing-Yu Zhang ,&nbsp;Fan Min","doi":"10.1016/j.patcog.2024.111063","DOIUrl":"10.1016/j.patcog.2024.111063","url":null,"abstract":"<div><div>Multi-instance learning (MIL) is a potent framework for solving weakly supervised problems, with bags containing multiple instances. Various embedding methods convert each bag into a vector in the new feature space based on a representative bag or instance, aiming to extract useful information from the bag. However, since the distribution of instances is related to labels, these methods rely solely on the overall perspective embedding without considering the different distribution characteristics, which will conflate the varied distributions of instances and thus lead to poor classification performance. In this paper, we propose the dual-perspective multi-instance embedding learning with adaptive density distribution mining (DPMIL) algorithm with three new techniques. First, the mutual instance selection technique consists of adaptive density distribution mining and discriminative evaluation. The distribution characteristics of negative instances and heterogeneous instance dissimilarity are effectively exploited to obtain instances with strong representativeness. Second, the embedding technique mines two crucial information of the bag simultaneously. Bags are converted into sequence invariant vectors according to the dual-perspective such that the distinguishability is maintained. Finally, the ensemble technique trains a batch of classifiers. The final model is obtained by weighted voting with the contribution of the dual-perspective embedding information. The experimental results demonstrate that the DPMIL algorithm has higher average accuracy than other compared algorithms, especially on web datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111063"},"PeriodicalIF":7.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142432515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SNN using color-opponent and attention mechanisms for object recognition 利用颜色-对手和注意力机制进行物体识别的 SNN
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-05 DOI: 10.1016/j.patcog.2024.111070
Zhiwei Yao , Shaobing Gao , Wenjuan Li
{"title":"SNN using color-opponent and attention mechanisms for object recognition","authors":"Zhiwei Yao ,&nbsp;Shaobing Gao ,&nbsp;Wenjuan Li","doi":"10.1016/j.patcog.2024.111070","DOIUrl":"10.1016/j.patcog.2024.111070","url":null,"abstract":"<div><div>The current spiking neural network (SNN) relies on spike-timing-dependent plasticity (STDP) primarily for shape learning in object recognition tasks, overlooking the equally critical aspect of color information. To address this gap, our study introduces an unsupervised variant of STDP that incorporates principles from color-opponency mechanisms (COM) and classical receptive fields (CRF) found in the biological visual system, facilitating the integration of color information during parameter updates within the SNN architecture. Our approach initially preprocesses images into two distinct feature maps: one for shape and another for color. Then, signals derived from COM and intensity concurrently drive the STDP process, thereby updating parameters associated with both color and shape feature maps. Furthermore, we propose a channel-wise attention mechanism to enhance differentiation among objects sharing similar shapes or colors. Specifically, this mechanism utilizes convolution to generate an output spike-wave, identifying a winner based on earliest spike timing and maximal potential. The winning kernel computes attention, which is then applied via convolution to each input image feature map, generating post-feature maps. A STDP-like normalization rule compares firing times between pre- and post-feature maps, dynamically adjusting channel weights to optimize object recognition during the training phase.</div><div>We assessed the proposed algorithm using SNN with both single-layer and multi-layer architectures across three datasets. Experimental findings highlight its efficacy and superiority in complex object recognition tasks compared to state-of-the-art (SOTA) algorithms. Notably, our approach achieved a significant 20% performance improvement over the SOTA on the Caltech-101 dataset. Moreover, the algorithm is well-suited for hardware implementation and energy efficiency, leveraging a winner-selection mechanism based on the earliest spike time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111070"},"PeriodicalIF":7.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization MBQuant:用于任意位宽网络量化的新型多分支拓扑方法
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-10-05 DOI: 10.1016/j.patcog.2024.111061
Yunshan Zhong , Yuyao Zhou , Fei Chao , Rongrong Ji
{"title":"MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization","authors":"Yunshan Zhong ,&nbsp;Yuyao Zhou ,&nbsp;Fei Chao ,&nbsp;Rongrong Ji","doi":"10.1016/j.patcog.2024.111061","DOIUrl":"10.1016/j.patcog.2024.111061","url":null,"abstract":"<div><div>Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance. To address this issue, we propose MBQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MBQuant duplicates the network body into multiple independent branches, where the weights of each branch are quantized to a fixed 2-bit and the activations remain in the input bit-width. For completing the computation of a desired bit-width, MBQuant selects multiple branches, ensuring that the computational costs match those of the desired bit-width, to carry out forward propagation. By fixing the weight bit-width, MBQuant substantially reduces quantization errors caused by switching weight bit-widths. Additionally, we observe that the first branch suffers from quantization errors caused by all bit-widths, leading to performance degradation. Thus, we introduce an amortization branch selection strategy that amortizes the errors. Specifically, the first branch is selected only for certain bit-widths, rather than universally, thereby the errors are distributed among the branches more evenly. Finally, we adopt an in-place distillation strategy that uses the largest bit-width to guide the other bit-widths to further enhance MBQuant’s performance. Extensive experiments demonstrate that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is made publicly available at <span><span>https://github.com/zysxmu/MBQuant</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111061"},"PeriodicalIF":7.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信