Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han
{"title":"Learning Signed Hyper Surfaces for Oriented Point Cloud Normal Estimation.","authors":"Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han","doi":"10.1109/TPAMI.2024.3431221","DOIUrl":"10.1109/TPAMI.2024.3431221","url":null,"abstract":"<p><p>We propose a novel method called SHS-Net for point cloud normal estimation by learning signed hyper surfaces, which can accurately predict normals with global consistent orientation from various point clouds. Almost all existing methods estimate oriented normals through a two-stage pipeline, i.e., unoriented normal estimation and normal orientation, and each step is implemented by a separate algorithm. However, previous methods are sensitive to parameter settings, resulting in poor results from point clouds with noise, density variations and complex geometries. In this work, we introduce signed hyper surfaces (SHS), which are parameterized by multi-layer perceptron (MLP) layers, to learn to estimate oriented normals from point clouds in an end-to-end manner. The signed hyper surfaces are implicitly learned in a high-dimensional feature space where the local and global information is aggregated. Specifically, we introduce a patch encoding module and a shape encoding module to encode a 3D point cloud into a local latent code and a global latent code, respectively. Then, an attention-weighted normal prediction module is proposed as a decoder, which takes the local and global latent codes as input to predict oriented normals. Experimental results show that our algorithm outperforms the state-of-the-art methods in both unoriented and oriented normal estimation.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zipeng Ye, Wenjian Luo, Qi Zhou, Zhenqian Zhu, Yuhui Shi, Yan Jia
{"title":"Gradient Inversion Attacks: Impact Factors Analyses and Privacy Enhancement.","authors":"Zipeng Ye, Wenjian Luo, Qi Zhou, Zhenqian Zhu, Yuhui Shi, Yan Jia","doi":"10.1109/TPAMI.2024.3430533","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3430533","url":null,"abstract":"<p><p>Gradient inversion attacks (GIAs) have posed significant challenges to the emerging paradigm of distributed learning, which aims to reconstruct the private training data of clients (participating parties in distributed training) through the shared parameters. For counteracting GIAs, a large number of privacy-preserving methods for distributed learning scenario have emerged. However, these methods have significant limitations, either compromising the usability of global model or consuming substantial additional computational resources. Furthermore, despite the extensive efforts dedicated to defense methods, the underlying causes of data leakage in distributed learning still have not been thoroughly investigated. Therefore, this paper tries to reveal the potential reasons behind the successful implementation of existing GIAs, explore variations in the robustness of models against GIAs during the training process, and investigate the impact of different model structures on attack performance. After these explorations and analyses, this paper propose a plug-and-play GIAs defense method, which augments the training data by a designed vicinal distribution. Sufficient empirical experiments demonstrate that this easy-toimplement method can ensure the basic level of privacy without compromising the usability of global model.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shinichi Satoh, Luc Van Gool, Zheng Wang
{"title":"Physical Adversarial Attack Meets Computer Vision: A Decade Survey.","authors":"Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shinichi Satoh, Luc Van Gool, Zheng Wang","doi":"10.1109/TPAMI.2024.3430860","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3430860","url":null,"abstract":"<p><p>Despite the impressive achievements of Deep Neural Networks (DNNs) in computer vision, their vulnerability to adversarial attacks remains a critical concern. Extensive research has demonstrated that incorporating sophisticated perturbations into input images can lead to a catastrophic degradation in DNNs' performance. This perplexing phenomenon not only exists in the digital space but also in the physical world. Consequently, it becomes imperative to evaluate the security of DNNs-based systems to ensure their safe deployment in real-world scenarios, particularly in security-sensitive applications. To facilitate a profound understanding of this topic, this paper presents a comprehensive overview of physical adversarial attacks. Firstly, we distill four general steps for launching physical adversarial attacks. Building upon this foundation, we uncover the pervasive role of artifacts carrying adversarial perturbations in the physical world. These artifacts influence each step. To denote them, we introduce a new term: adversarial medium. Then, we take the first step to systematically evaluate the performance of physical adversarial attacks, taking the adversarial medium as a first attempt. Our proposed evaluation metric, hiPAA, comprises six perspectives: Effectiveness, Stealthiness, Robustness, Practicability, Aesthetics, and Economics. We also provide comparative results across task categories, together with insightful observations and suggestions for future research directions.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyu Li, Danni Ai, Hong Song, Jingfan Fan, Tianyu Fu, Deqiang Xiao, Yining Wang, Jian Yang
{"title":"STQD-Det: Spatio-Temporal Quantum Diffusion Model for Real-time Coronary Stenosis Detection in X-ray Angiography.","authors":"Xinyu Li, Danni Ai, Hong Song, Jingfan Fan, Tianyu Fu, Deqiang Xiao, Yining Wang, Jian Yang","doi":"10.1109/TPAMI.2024.3430839","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3430839","url":null,"abstract":"<p><p>Detecting coronary stenosis accurately in X-ray angiography (XRA) is important for diagnosing and treating coronary artery disease (CAD). However, challenges arise from factors like breathing and heart motion, poor imaging quality, and the complex vascular structures, making it difficult to identify stenosis fast and precisely. In this study, we proposed a Quantum Diffusion Model with Spatio-Temporal Feature Sharing to Real-time detect Stenosis (STQD-Det). Our framework consists of two modules: Sequential Quantum Noise Boxes module and spatio-temporal feature module. To evaluate the effectiveness of the method, we conducted a 4-fold cross-validation using a dataset consisting of 233 XRA sequences. Our approach achieved the F1 score of 92.39% with a real-time processing speed of 25.08 frames per second. These results outperform 17 state-of-the-art methods. The experimental results show that the proposed method can accomplish the stenosis detection quickly and accurately.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attention-Guided Low-Rank Tensor Completion.","authors":"Truong Thanh Nhat Mai, Edmund Y Lam, Chul Lee","doi":"10.1109/TPAMI.2024.3429498","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3429498","url":null,"abstract":"<p><p>Low-rank tensor completion (LRTC) aims to recover missing data of high-dimensional structures from a limited set of observed entries. Despite recent significant successes, the original structures of data tensors are still not effectively preserved in LRTC algorithms, yielding less accurate restoration results. Moreover, LRTC algorithms often incur high computational costs, which hinder their applicability. In this work, we propose an attention-guided low-rank tensor completion (AGTC) algorithm, which can faithfully restore the original structures of data tensors using deep unfolding attention-guided tensor factorization. First, we formulate the LRTC task as a robust factorization problem based on low-rank and sparse error assumptions. Low-rank tensor recovery is guided by an attention mechanism to better preserve the structures of the original data. We also develop implicit regularizers to compensate for modeling inaccuracies. Then, we solve the optimization problem by employing an iterative technique. Finally, we design a multistage deep network by unfolding the iterative algorithm, where each stage corresponds to an iteration of the algorithm; at each stage, the optimization variables and regularizers are updated by closed-form solutions and learned deep networks, respectively. Experimental results for high dynamic range imaging and hyperspectral image restoration show that the proposed algorithm outperforms state-of-the-art algorithms.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ZhangJin Huang, Yuxin Wen, ZiHao Wang, Jinjuan Ren, Kui Jia
{"title":"Surface Reconstruction from Point Clouds: A Survey and a Benchmark.","authors":"ZhangJin Huang, Yuxin Wen, ZiHao Wang, Jinjuan Ren, Kui Jia","doi":"10.1109/TPAMI.2024.3429209","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3429209","url":null,"abstract":"<p><p>Reconstruction of a continuous surface of two-dimensional manifold from its raw, discrete point cloud observation is a long-standing problem in computer vision and graphics research. The problem is technically ill-posed, and becomes more difficult considering that various sensing imperfections would appear in the point clouds obtained by practical depth scanning. In literature, a rich set of methods has been proposed, and reviews of existing methods are also provided. However, existing reviews are short of thorough investigations on a common benchmark. The present paper aims to review and benchmark existing methods in the new era of deep learning surface reconstruction. To this end, we contribute a large-scale benchmarking dataset consisting of both synthetic and real-scanned data; the benchmark includes object- and scene-level surfaces and takes into account various sensing imperfections that are commonly encountered in practical depth scanning. We conduct thorough empirical studies by comparing existing methods on the constructed benchmark, and pay special attention on robustness of existing methods against various scanning imperfections; we also study how different methods generalize in terms of reconstructing complex surface shapes. Our studies help identity the best conditions under which different methods work, and suggest some empirical findings. For example, while deep learning methods are increasingly popular in the research community, our systematic studies suggest that, surprisingly, a few classical methods perform even better in terms of both robustness and generalization; our studies also suggest that the practical challenges of misalignment of point sets from multi-view scanning, missing of surface points, and point outliers remain unsolved by all the existing surface reconstruction methods. We expect that the benchmark and our studies would be valuable both for practitioners and as a guidance for new innovations in future research. We make the benchmark publicly accessible at https://Gorilla-Lab-SCUT.github.io/SurfaceReconstructionBenchmark.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human-Centric Transformer for Domain Adaptive Action Recognition.","authors":"Kun-Yu Lin, Jiaming Zhou, Wei-Shi Zheng","doi":"10.1109/TPAMI.2024.3429387","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3429387","url":null,"abstract":"<p><p>We study the domain adaptation task for action recognition, namely domain adaptive action recognition, which aims to effectively transfer action recognition power from a label-sufficient source domain to a label-free target domain. Since actions are performed by humans, it is crucial to exploit human cues in videos when recognizing actions across domains. However, existing methods are prone to losing human cues but prefer to exploit the correlation between non-human contexts and associated actions for recognition, and the contexts of interest agnostic to actions would reduce recognition performance in the target domain. To overcome this problem, we focus on uncovering human-centric action cues for domain adaptive action recognition, and our conception is to investigate two aspects of human-centric action cues, namely human cues and human-context interaction cues. Accordingly, our proposed Human-Centric Transformer (HCTransformer) develops a decoupled human-centric learning paradigm to explicitly concentrate on human-centric action cues in domain-variant video feature learning. Our HCTransformer first conducts human-aware temporal modeling by a human encoder, aiming to avoid a loss of human cues during domain-invariant video feature learning. Then, by a Transformer-like architecture, HCTransformer exploits domain-invariant and action-correlated contexts by a context encoder, and further models domain-invariant interaction between humans and action-correlated contexts. We conduct extensive experiments on three benchmarks, namely UCF-HMDB, Kinetics-NecDrone and EPIC-Kitchens-UDA, and the state-of-the-art performance demonstrates the effectiveness of our proposed HCTransformer.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu
{"title":"Class-Incremental Learning: A Survey.","authors":"Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu","doi":"10.1109/TPAMI.2024.3429383","DOIUrl":"10.1109/TPAMI.2024.3429383","url":null,"abstract":"<p><p>Deep models, e.g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world. However, novel classes emerge from time to time in our ever-changing world, requiring a learning system to acquire new knowledge continually. Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally and build a universal classifier among all seen classes. Correspondingly, when directly training the model with new class instances, a fatal problem occurs - the model tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. There have been numerous efforts to tackle catastrophic forgetting in the machine learning community. In this paper, we survey comprehensively recent advances in class-incremental learning and summarize these methods from several aspects. We also provide a rigorous and unified evaluation of 17 methods in benchmark image classification tasks to find out the characteristics of different algorithms empirically. Furthermore, we notice that the current comparison protocol ignores the influence of memory budget in model storage, which may result in unfair comparison and biased results. Hence, we advocate fair comparison by aligning the memory budget in evaluation, as well as several memory-agnostic performance measures. The source code is available at https://github.com/zhoudw-zdw/CIL_Survey/.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling.","authors":"Wanjie Sun, Zhenzhong Chen","doi":"10.1109/TPAMI.2024.3428546","DOIUrl":"10.1109/TPAMI.2024.3428546","url":null,"abstract":"<p><p>Learning based single image super-resolution (SISR) for real-world images has been an active research topic yet a challenging task, due to the lack of paired low-resolution (LR) and high-resolution (HR) training images. Most of the existing unsupervised real-world SISR methods adopt a twostage training strategy by synthesizing realistic LR images from their HR counterparts first, then training the super-resolution (SR) models in a supervised manner. However, the training of image degradation and SR models in this strategy are separate, ignoring the inherent mutual dependency between downscaling and its inverse upscaling process. Additionally, the ill-posed nature of image degradation is not fully considered. In this paper, we propose an image downscaling and SR model dubbed as SDFlow, which simultaneously learns a bidirectional manyto- many mapping between real-world LR and HR images unsupervisedly. The main idea of SDFlow is to decouple image content and degradation information in the latent space, where content information distribution of LR and HR images is matched in a common latent space. Degradation information of the LR images and the high-frequency information of the HR images are fitted to an easy-to-sample conditional distribution. Experimental results on real-world image SR datasets indicate that SDFlow can generate diverse realistic LR and SR images both quantitatively and qualitatively.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Markov Progressive Framework, a Universal Paradigm for Modeling Long Videos.","authors":"Bo Pang, Gao Peng, Yizhuo Li, Cewu Lu","doi":"10.1109/TPAMI.2024.3426998","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3426998","url":null,"abstract":"<p><p>Compared to images, video, as an increasingly mainstream visual media, contains more semantic information. For this reason, the computational complexity of video models is an order of magnitude larger than their image-level counterparts, which increases linearly with the square number of frames. Constrained by computational resources, training video models to learn long-term temporal semantics end-to-end is quite a challenge. Currently, the main-stream method is to split a raw video into clips, leading to incomplete fragmentary temporal information flow and failure of modeling long-term semantics. To solve this problem, in this paper, we design the Markov Progressive framework (MaPro), a theoretical framework consisting of the progressive modeling method and a paradigm model tailored for it. Inspired by natural language processing techniques dealing with long sentences, the core idea of MaPro is to find a paradigm model consisting of proposed Markov operators which can be trained in multiple sequential steps and ensure that the multi-step progressive modeling is equivalent to the conventional end-to-end modeling. By training the paradigm model under the progressive method, we are able to model long videos end-to-end with limited resources and ensure the effective transmission of long-term temporal information. We provide detailed implementations of this theoretical system on the mainstream CNN- and Transformer-based models, where they are modified to conform to the Markov paradigm. The theoretical paradigm as a basic model is the lower bound of the model efficiency. With it, we further explore more sophisticated designs for CNN and Transformer-based methods specifically. As a general and robust training method, we experimentally demonstrate that it yields significant performance improvements on different backbones and datasets. As an illustrative example, the proposed method improves the SlowOnly network by 4.1 mAP on Charades and 2.5 top-1 accuracy on Kinetics. And for TimeSformer, MaPro improves its performance on Kinetics by 2.0 top-1 accuracy. Importantly, all these improvements are achieved with a little parameter and computation overhead. We hope the MaPro method can provide the community with new insight into modeling long videos.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}