{"title":"Multi-Objective Convex Quantization for Efficient Model Compression","authors":"Chunxiao Fan;Dan Guo;Ziqi Wang;Meng Wang","doi":"10.1109/TPAMI.2024.3521589","DOIUrl":"10.1109/TPAMI.2024.3521589","url":null,"abstract":"Quantization is one of the efficient model compression methods, which represents the network with fixed-point or low-bit numbers. Existing quantization methods address the network quantization by treating it as a single-objective optimization that pursues high accuracy (performance optimization) while keeping the quantization constraint. However, owing to the non-differentiability of the quantization operation, it is challenging to integrate the quantization operation into the network training and achieve optimal parameters. In this paper, a novel multi-objective convex quantization for efficient model compression is proposed. Specifically, the network training is modeled as a multi-objective optimization to find the network with both high precision and low quantization error (actually, these two goals are somewhat contradictory and affect each other). To achieve effective multi-objective optimization, this paper designs a quantization error function that is differentiable and ensures the computation convexity in each period, so as to avoid the non-differentiable back-propagation of the quantization operation. Then, we perform a time-series self-distillation training scheme on the multi-objective optimization framework, which distills its past softened labels and combines the hard targets to guarantee controllable and stable performance convergence during training. At last and more importantly, a new dynamic Lagrangian coefficient adaption is designed to adjust the gradient magnitude of quantization loss and performance loss and balance the two losses during training processing. The proposed method is evaluated on well-known benchmarks: MNIST, CIFAR-10/100, ImageNet, Penn Treebank and Microsoft COCO, and experimental results show that the proposed method achieves outstanding performance compared to existing methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2313-2329"},"PeriodicalIF":0.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Federated Multi-View K-Means Clustering","authors":"Miin-Shen Yang;Kristina P. Sinaga","doi":"10.1109/TPAMI.2024.3520708","DOIUrl":"10.1109/TPAMI.2024.3520708","url":null,"abstract":"The increasing effect of Internet of Things (IoT) unlocks the massive volume of the availability of Big Data in many fields. Generally, these Big Data may be in a non-independently and identically distributed fashion (non-IID). In this paper, we have contributions in such a way enable multi-view k-means (MVKM) clustering to maintain the privacy of each database by allowing MVKM to be operated on the local principle of clients’ multi-view data. This work integrates the exponential distance to transform the weighted Euclidean distance on MVKM so that it can make full use of development in federated learning via the MVKM clustering algorithm. The proposed algorithm, called a federated MVKM (Fed-MVKM), can provide a whole new level adding a lot of new ideas to produce a much better output. The proposed Fed-MVKM is highly suitable for clustering large data sets. To demonstrate its efficient and applicable, we implement a synthetic and six real multi-view data sets and then perform Federated Peter-Clark in Huang et al. 2023 for causal inference setting to split the data instances over multiple clients, efficiently. The results show that shared-models based local cluster centers with data-driven in the federated environment can generate a satisfying final pattern of one multi-view data that simultaneously improve the clustering performance of (non-federated) MVKM clustering algorithms.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2446-2459"},"PeriodicalIF":0.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142867130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demystify Transformers & Convolutions in Modern Image Deep Networks","authors":"Xiaowei Hu;Min Shi;Weiyun Wang;Sitong Wu;Linjie Xing;Wenhai Wang;Xizhou Zhou;Lewei Lu;Jie Zhou;Xiaogang Wang;Yu Qiao;Jifeng Dai","doi":"10.1109/TPAMI.2024.3520508","DOIUrl":"10.1109/TPAMI.2024.3520508","url":null,"abstract":"Vision transformers have gained popularity recently, leading to the development of new vision backbones with improved features and consistent performance gains. However, these advancements are not solely attributable to novel feature transformation designs; certain benefits also arise from advanced network-level and block-level architectures. This paper aims to identify the real gains of popular convolution and attention operators through a detailed study. We find that the key difference among these feature transformation modules, such as attention or convolution, lies in their spatial feature aggregation approach, known as the “spatial token mixer” (STM). To facilitate an impartial comparison, we introduce a unified architecture to neutralize the impact of divergent network-level and block-level designs. Subsequently, various STMs are integrated into this unified framework for comprehensive comparative analysis. Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs. Our detailed analysis also reveals various findings about different STMs, including effective receptive fields, invariance, and adversarial robustness tests.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2416-2428"},"PeriodicalIF":0.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142867131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practically Unbiased Pairwise Loss for Recommendation With Implicit Feedback","authors":"Tianwei Cao;Qianqian Xu;Zhiyong Yang;Zhanyu Ma;Qingming Huang","doi":"10.1109/TPAMI.2024.3519711","DOIUrl":"10.1109/TPAMI.2024.3519711","url":null,"abstract":"Recommender systems have been widely employed on various online platforms to improve user experience. In these systems, recommendation models are often learned from the users’ historical behaviors that are automatically collected. Notably, recommender systems differ slightly from ordinary supervised learning tasks. In recommender systems, there is an exposure mechanism that decides which items could be presented to each specific user, which breaks the i.i.d assumption of supervised learning and brings biases into the recommendation models. In this paper, we focus on unbiased ranking loss weighted by inversed propensity scores (IPS), which are widely used in recommendations with implicit feedback labels. More specifically, we first highlight the fact that there is a gap between theory and practice in IPS-weighted unbiased loss. The existing pairwise loss could be theoretically unbiased by adopting an IPS weighting scheme. Unfortunately, the propensity scores are hard to estimate due to the inaccessibility of each user-item pair's true exposure status. In practical scenarios, we can only approximate the propensity scores. In this way, the theoretically unbiased loss would be still practically biased. To solve this problem, we first construct a theoretical framework to obtain a generalization upper bound of the current theoretically unbiased loss. The bound illustrates that we can ensure the theoretically unbiased loss's generalization ability if we lower its implementation loss and practical bias at the same time. To that aim, we suggest treating feedback label <inline-formula><tex-math>$Y_{ui}$</tex-math></inline-formula> as a noisy proxy for exposure result <inline-formula><tex-math>$O_{ui}$</tex-math></inline-formula> for each user-item pair <inline-formula><tex-math>$(u, i)$</tex-math></inline-formula>. Here we assume the noise rate meets the condition that <inline-formula><tex-math>$hat{P}(O_{ui}=1, Y_{ui}ne O_{ui}) < 1/2$</tex-math></inline-formula>. According to our analysis, this is a mild assumption that can be satisfied by many real-world applications. Based on this, we could train an accurate propensity model directly by leveraging a noise-resistant loss function. Then we could construct a practically unbiased recommendation model weighted by precise propensity scores. Lastly, experimental findings on public datasets demonstrate our suggested method's effectiveness.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2460-2474"},"PeriodicalIF":0.0,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantum Gated Recurrent Neural Networks","authors":"Yanan Li;Zhimin Wang;Ruipeng Xing;Changheng Shao;Shangshang Shi;Jiaxin Li;Guoqiang Zhong;Yongjian Gu","doi":"10.1109/TPAMI.2024.3519605","DOIUrl":"10.1109/TPAMI.2024.3519605","url":null,"abstract":"The exploration of quantum advantages with Quantum Neural Networks (QNNs) is an exciting endeavor. Recurrent neural networks, the widely used framework in deep learning, suffer from the gradient vanishing and exploding problem, which limits their ability to learn long-term dependencies. To address this challenge, in this work, we develop the sequential model of Quantum Gated Recurrent Neural Networks (QGRNNs). This model naturally integrates the gating mechanism into the framework of the variational ansatz circuit of QNNs, enabling efficient execution on near-term quantum devices. We present rigorous proof that QGRNNs can preserve the gradient norm of long-term interactions throughout the recurrent network, enabling efficient learning of long-term dependencies. Meanwhile, the architectural features of QGRNNs can effectively mitigate the barren plateau phenomenon. The effectiveness of QGRNNs in sequential learning is convincingly demonstrated through various typical tasks, including solving the adding problem, learning gene regulatory networks, and predicting stock prices. The hardware-efficient architecture and superior performance of our QGRNNs indicate their promising potential for finding quantum advantageous applications in the near term.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2493-2504"},"PeriodicalIF":0.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuxiang Sun;Gong Cheng;Hongda Li;Chunbo Lang;Junwei Han
{"title":"STDatav2: Accessing Efficient Black-Box Stealing for Adversarial Attacks","authors":"Xuxiang Sun;Gong Cheng;Hongda Li;Chunbo Lang;Junwei Han","doi":"10.1109/TPAMI.2024.3519803","DOIUrl":"10.1109/TPAMI.2024.3519803","url":null,"abstract":"On account of the extreme settings, stealing the black-box model without its training data is difficult in practice. On this topic, along the lines of data diversity, this paper substantially makes the following improvements based on our conference version (dubbed STDatav1, short for Surrogate Training Data). First, to mitigate the undesirable impacts of the potential mode collapse while training the generator, we propose the joint-data optimization scheme, which utilizes both the synthesized data and the proxy data to optimize the surrogate model. Second, we propose the self-conditional data synthesis framework, an interesting effort that builds the pseudo-class mapping framework via grouping class information extraction to hold the class-specific constraints while holding the diversity. Within this new framework, we inherit and integrate the class-specific constraints of STDatav1 and design a dual cross-entropy loss to fit this new framework. Finally, to facilitate comprehensive evaluations, we perform experiments on four commonly adopted datasets, and a total of eight kinds of models are employed. These assessments witness the considerable performance gains compared to our early work and demonstrate the competitive ability and promising potential of our approach.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2429-2445"},"PeriodicalIF":0.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Supervised Anomaly Detection With Neural Transformations","authors":"Chen Qiu;Marius Kloft;Stephan Mandt;Maja Rudolph","doi":"10.1109/TPAMI.2024.3519543","DOIUrl":"10.1109/TPAMI.2024.3519543","url":null,"abstract":"Data augmentation plays a critical role in self-supervised learning, including anomaly detection. While hand-crafted transformations such as image rotations can achieve impressive performance on image data, effective transformations of non-image data are lacking. In this work, we study <italic>learning</i> such transformations for end-to-end anomaly detection on arbitrary data. We find that a contrastive loss–which encourages learning diverse data transformations while preserving the relevant semantic content of the data–is more suitable than previously proposed losses for transformation learning, a fact that we prove theoretically and empirically. We demonstrate that anomaly detection using neural transformation learning can achieve state-of-the-art results for time series data, tabular data, text data and graph data. Furthermore, our approach can make image anomaly detection more interpretable by learning transformations at different levels of abstraction.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"2170-2185"},"PeriodicalIF":0.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10806806","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trajectory of Fifths Based on Chroma Subbands Extraction–A New Approach to Music Representation, Analysis, and Classification","authors":"Tomasz Lukaszewicz;Dariusz Kania","doi":"10.1109/TPAMI.2024.3519420","DOIUrl":"10.1109/TPAMI.2024.3519420","url":null,"abstract":"In this article, we propose a new method of representing and analyzing music audio records. The method is based on the concept of the trajectory of fifths, which was initially developed for the analysis of music represented in MIDI format. To adapt this concept to the needs of audio signal processing, we implement a short-term spectral analysis of a musical piece, followed by a mapping of its subsequent spectral timeframes onto signatures of fifths reflecting relative intensities of sounds associated with each of the 12 pitch classes. Subsequently, the calculation of the characteristic points of the consecutive signatures of fifths enables the creation of the trajectory of fifths. The results of the experiments and statistical analysis conducted in a set of 8996 audio music pieces belonging to 10 genres indicate that this kind of trajectory, just as its MIDI-compliant precursor, is a source of valuable information (i.e., feature coefficients) concerning the harmonic structure of music, which may find use in audio music classification processes.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"2157-2169"},"PeriodicalIF":0.0,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10804652","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feiping Nie;Yitao Song;Wei Chang;Rong Wang;Xuelong Li
{"title":"Fast Semi-Supervised Learning on Large Graphs: An Improved Green-Function Method","authors":"Feiping Nie;Yitao Song;Wei Chang;Rong Wang;Xuelong Li","doi":"10.1109/TPAMI.2024.3518595","DOIUrl":"10.1109/TPAMI.2024.3518595","url":null,"abstract":"In the graph-based semi-supervised learning, the Green-function method is a classical method that works by computing the Green's function in the graph space. However, when applied to large graphs, especially those sparse ones, this method performs unstably and unsatisfactorily. We make a detailed analysis on it and propose a novel method from the perspective of optimization. On fully connected graphs, the method is equivalent to the Green-function method and can be seen as another interpretation with physical meanings, while on non-fully connected graphs, it helps to explain why the Green-function method causes a mess on large sparse graphs. To solve this dilemma, we propose a workable approach to improve our proposed method. Unlike the original method, our improved method can also apply two accelerating techniques, Gaussian Elimination, and Anchored Graphs to become more efficient on large graphs. Finally, the extensive experiments prove our conclusions and the efficiency, accuracy, and stability of our improved Green's function method.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"2055-2070"},"PeriodicalIF":0.0,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PDPP: Projected Diffusion for Procedure Planning in Instructional Videos","authors":"Hanlin Wang;Yilu Wu;Sheng Guo;Limin Wang","doi":"10.1109/TPAMI.2024.3518762","DOIUrl":"10.1109/TPAMI.2024.3518762","url":null,"abstract":"In this paper, we study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal. Previous works cast this as a sequence modeling problem and leverage either intermediate visual observations or language instructions as supervision to make autoregressive planning, resulting in complex learning schemes and expensive annotation costs. To avoid intermediate supervision annotation and error accumulation caused by planning autoregressively, we propose a diffusion-based framework, coined as PDPP (Projected Diffusion model for Procedure Planning), to directly model the whole action sequence distribution with task label as supervision instead. Our core idea is to treat procedure planning as a distribution fitting problem under the given observations, thus transform the planning problem to a sampling process from this distribution during inference. The diffusion-based modeling approach also effectively addresses the uncertainty issue in procedure planning. Based on PDPP, we further apply joint training to our framework to generate plans with varying horizon lengths using a single model and reduce the number of training parameters required. We instantiate our PDPP with three popular diffusion models and investigate a serious of condition-introducing methods in our framework, including condition embeddings, Mixture-of-Experts (MoEs), two-stage prediction and Classifier-Free Guidance strategy. Finally, we apply our PDPP to the Visual Planners for human Assistance (VPA) problem which requires the goal specified in natural language rather than visual observation. We conduct experiments on challenging datasets of different scales and our PDPP model achieves the state-of-the-art performance on multiple metrics, even compared with those strongly-supervised counterparts. These results further demonstratethe effectiveness and generalization ability of our model.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"2107-2124"},"PeriodicalIF":0.0,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}