IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
Learning Signed Hyper Surfaces for Oriented Point Cloud Normal Estimation. 为定向点云法线估计学习签名超曲面
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-19 DOI: 10.1109/TPAMI.2024.3431221
Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han
{"title":"Learning Signed Hyper Surfaces for Oriented Point Cloud Normal Estimation.","authors":"Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han","doi":"10.1109/TPAMI.2024.3431221","DOIUrl":"10.1109/TPAMI.2024.3431221","url":null,"abstract":"<p><p>We propose a novel method called SHS-Net for point cloud normal estimation by learning signed hyper surfaces, which can accurately predict normals with global consistent orientation from various point clouds. Almost all existing methods estimate oriented normals through a two-stage pipeline, i.e., unoriented normal estimation and normal orientation, and each step is implemented by a separate algorithm. However, previous methods are sensitive to parameter settings, resulting in poor results from point clouds with noise, density variations and complex geometries. In this work, we introduce signed hyper surfaces (SHS), which are parameterized by multi-layer perceptron (MLP) layers, to learn to estimate oriented normals from point clouds in an end-to-end manner. The signed hyper surfaces are implicitly learned in a high-dimensional feature space where the local and global information is aggregated. Specifically, we introduce a patch encoding module and a shape encoding module to encode a 3D point cloud into a local latent code and a global latent code, respectively. Then, an attention-weighted normal prediction module is proposed as a decoder, which takes the local and global latent codes as input to predict oriented normals. Experimental results show that our algorithm outperforms the state-of-the-art methods in both unoriented and oriented normal estimation.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient Inversion Attacks: Impact Factors Analyses and Privacy Enhancement. 梯度反转攻击:影响因素分析与隐私增强
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-18 DOI: 10.1109/TPAMI.2024.3430533
Zipeng Ye, Wenjian Luo, Qi Zhou, Zhenqian Zhu, Yuhui Shi, Yan Jia
{"title":"Gradient Inversion Attacks: Impact Factors Analyses and Privacy Enhancement.","authors":"Zipeng Ye, Wenjian Luo, Qi Zhou, Zhenqian Zhu, Yuhui Shi, Yan Jia","doi":"10.1109/TPAMI.2024.3430533","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3430533","url":null,"abstract":"<p><p>Gradient inversion attacks (GIAs) have posed significant challenges to the emerging paradigm of distributed learning, which aims to reconstruct the private training data of clients (participating parties in distributed training) through the shared parameters. For counteracting GIAs, a large number of privacy-preserving methods for distributed learning scenario have emerged. However, these methods have significant limitations, either compromising the usability of global model or consuming substantial additional computational resources. Furthermore, despite the extensive efforts dedicated to defense methods, the underlying causes of data leakage in distributed learning still have not been thoroughly investigated. Therefore, this paper tries to reveal the potential reasons behind the successful implementation of existing GIAs, explore variations in the robustness of models against GIAs during the training process, and investigate the impact of different model structures on attack performance. After these explorations and analyses, this paper propose a plug-and-play GIAs defense method, which augments the training data by a designed vicinal distribution. Sufficient empirical experiments demonstrate that this easy-toimplement method can ensure the basic level of privacy without compromising the usability of global model.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physical Adversarial Attack Meets Computer Vision: A Decade Survey. 物理对抗攻击与计算机视觉:十年调查
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-18 DOI: 10.1109/TPAMI.2024.3430860
Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shinichi Satoh, Luc Van Gool, Zheng Wang
{"title":"Physical Adversarial Attack Meets Computer Vision: A Decade Survey.","authors":"Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shinichi Satoh, Luc Van Gool, Zheng Wang","doi":"10.1109/TPAMI.2024.3430860","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3430860","url":null,"abstract":"<p><p>Despite the impressive achievements of Deep Neural Networks (DNNs) in computer vision, their vulnerability to adversarial attacks remains a critical concern. Extensive research has demonstrated that incorporating sophisticated perturbations into input images can lead to a catastrophic degradation in DNNs' performance. This perplexing phenomenon not only exists in the digital space but also in the physical world. Consequently, it becomes imperative to evaluate the security of DNNs-based systems to ensure their safe deployment in real-world scenarios, particularly in security-sensitive applications. To facilitate a profound understanding of this topic, this paper presents a comprehensive overview of physical adversarial attacks. Firstly, we distill four general steps for launching physical adversarial attacks. Building upon this foundation, we uncover the pervasive role of artifacts carrying adversarial perturbations in the physical world. These artifacts influence each step. To denote them, we introduce a new term: adversarial medium. Then, we take the first step to systematically evaluate the performance of physical adversarial attacks, taking the adversarial medium as a first attempt. Our proposed evaluation metric, hiPAA, comprises six perspectives: Effectiveness, Stealthiness, Robustness, Practicability, Aesthetics, and Economics. We also provide comparative results across task categories, together with insightful observations and suggestions for future research directions.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STQD-Det: Spatio-Temporal Quantum Diffusion Model for Real-time Coronary Stenosis Detection in X-ray Angiography. STQD-Det:用于 X 射线血管造影术中实时冠状动脉狭窄检测的时空量子扩散模型。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-18 DOI: 10.1109/TPAMI.2024.3430839
Xinyu Li, Danni Ai, Hong Song, Jingfan Fan, Tianyu Fu, Deqiang Xiao, Yining Wang, Jian Yang
{"title":"STQD-Det: Spatio-Temporal Quantum Diffusion Model for Real-time Coronary Stenosis Detection in X-ray Angiography.","authors":"Xinyu Li, Danni Ai, Hong Song, Jingfan Fan, Tianyu Fu, Deqiang Xiao, Yining Wang, Jian Yang","doi":"10.1109/TPAMI.2024.3430839","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3430839","url":null,"abstract":"<p><p>Detecting coronary stenosis accurately in X-ray angiography (XRA) is important for diagnosing and treating coronary artery disease (CAD). However, challenges arise from factors like breathing and heart motion, poor imaging quality, and the complex vascular structures, making it difficult to identify stenosis fast and precisely. In this study, we proposed a Quantum Diffusion Model with Spatio-Temporal Feature Sharing to Real-time detect Stenosis (STQD-Det). Our framework consists of two modules: Sequential Quantum Noise Boxes module and spatio-temporal feature module. To evaluate the effectiveness of the method, we conducted a 4-fold cross-validation using a dataset consisting of 233 XRA sequences. Our approach achieved the F1 score of 92.39% with a real-time processing speed of 25.08 frames per second. These results outperform 17 state-of-the-art methods. The experimental results show that the proposed method can accomplish the stenosis detection quickly and accurately.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-Guided Low-Rank Tensor Completion. 注意力引导的低张量补全
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-17 DOI: 10.1109/TPAMI.2024.3429498
Truong Thanh Nhat Mai, Edmund Y Lam, Chul Lee
{"title":"Attention-Guided Low-Rank Tensor Completion.","authors":"Truong Thanh Nhat Mai, Edmund Y Lam, Chul Lee","doi":"10.1109/TPAMI.2024.3429498","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3429498","url":null,"abstract":"<p><p>Low-rank tensor completion (LRTC) aims to recover missing data of high-dimensional structures from a limited set of observed entries. Despite recent significant successes, the original structures of data tensors are still not effectively preserved in LRTC algorithms, yielding less accurate restoration results. Moreover, LRTC algorithms often incur high computational costs, which hinder their applicability. In this work, we propose an attention-guided low-rank tensor completion (AGTC) algorithm, which can faithfully restore the original structures of data tensors using deep unfolding attention-guided tensor factorization. First, we formulate the LRTC task as a robust factorization problem based on low-rank and sparse error assumptions. Low-rank tensor recovery is guided by an attention mechanism to better preserve the structures of the original data. We also develop implicit regularizers to compensate for modeling inaccuracies. Then, we solve the optimization problem by employing an iterative technique. Finally, we design a multistage deep network by unfolding the iterative algorithm, where each stage corresponds to an iteration of the algorithm; at each stage, the optimization variables and regularizers are updated by closed-form solutions and learned deep networks, respectively. Experimental results for high dynamic range imaging and hyperspectral image restoration show that the proposed algorithm outperforms state-of-the-art algorithms.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Surface Reconstruction from Point Clouds: A Survey and a Benchmark. 从点云重建地表:调查与基准
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-16 DOI: 10.1109/TPAMI.2024.3429209
ZhangJin Huang, Yuxin Wen, ZiHao Wang, Jinjuan Ren, Kui Jia
{"title":"Surface Reconstruction from Point Clouds: A Survey and a Benchmark.","authors":"ZhangJin Huang, Yuxin Wen, ZiHao Wang, Jinjuan Ren, Kui Jia","doi":"10.1109/TPAMI.2024.3429209","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3429209","url":null,"abstract":"<p><p>Reconstruction of a continuous surface of two-dimensional manifold from its raw, discrete point cloud observation is a long-standing problem in computer vision and graphics research. The problem is technically ill-posed, and becomes more difficult considering that various sensing imperfections would appear in the point clouds obtained by practical depth scanning. In literature, a rich set of methods has been proposed, and reviews of existing methods are also provided. However, existing reviews are short of thorough investigations on a common benchmark. The present paper aims to review and benchmark existing methods in the new era of deep learning surface reconstruction. To this end, we contribute a large-scale benchmarking dataset consisting of both synthetic and real-scanned data; the benchmark includes object- and scene-level surfaces and takes into account various sensing imperfections that are commonly encountered in practical depth scanning. We conduct thorough empirical studies by comparing existing methods on the constructed benchmark, and pay special attention on robustness of existing methods against various scanning imperfections; we also study how different methods generalize in terms of reconstructing complex surface shapes. Our studies help identity the best conditions under which different methods work, and suggest some empirical findings. For example, while deep learning methods are increasingly popular in the research community, our systematic studies suggest that, surprisingly, a few classical methods perform even better in terms of both robustness and generalization; our studies also suggest that the practical challenges of misalignment of point sets from multi-view scanning, missing of surface points, and point outliers remain unsolved by all the existing surface reconstruction methods. We expect that the benchmark and our studies would be valuable both for practitioners and as a guidance for new innovations in future research. We make the benchmark publicly accessible at https://Gorilla-Lab-SCUT.github.io/SurfaceReconstructionBenchmark.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-Centric Transformer for Domain Adaptive Action Recognition. 以人为本的领域自适应动作识别转换器
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-16 DOI: 10.1109/TPAMI.2024.3429387
Kun-Yu Lin, Jiaming Zhou, Wei-Shi Zheng
{"title":"Human-Centric Transformer for Domain Adaptive Action Recognition.","authors":"Kun-Yu Lin, Jiaming Zhou, Wei-Shi Zheng","doi":"10.1109/TPAMI.2024.3429387","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3429387","url":null,"abstract":"<p><p>We study the domain adaptation task for action recognition, namely domain adaptive action recognition, which aims to effectively transfer action recognition power from a label-sufficient source domain to a label-free target domain. Since actions are performed by humans, it is crucial to exploit human cues in videos when recognizing actions across domains. However, existing methods are prone to losing human cues but prefer to exploit the correlation between non-human contexts and associated actions for recognition, and the contexts of interest agnostic to actions would reduce recognition performance in the target domain. To overcome this problem, we focus on uncovering human-centric action cues for domain adaptive action recognition, and our conception is to investigate two aspects of human-centric action cues, namely human cues and human-context interaction cues. Accordingly, our proposed Human-Centric Transformer (HCTransformer) develops a decoupled human-centric learning paradigm to explicitly concentrate on human-centric action cues in domain-variant video feature learning. Our HCTransformer first conducts human-aware temporal modeling by a human encoder, aiming to avoid a loss of human cues during domain-invariant video feature learning. Then, by a Transformer-like architecture, HCTransformer exploits domain-invariant and action-correlated contexts by a context encoder, and further models domain-invariant interaction between humans and action-correlated contexts. We conduct extensive experiments on three benchmarks, namely UCF-HMDB, Kinetics-NecDrone and EPIC-Kitchens-UDA, and the state-of-the-art performance demonstrates the effectiveness of our proposed HCTransformer.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class-Incremental Learning: A Survey. 班级强化学习:调查。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-16 DOI: 10.1109/TPAMI.2024.3429383
Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu
{"title":"Class-Incremental Learning: A Survey.","authors":"Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan, Ziwei Liu","doi":"10.1109/TPAMI.2024.3429383","DOIUrl":"10.1109/TPAMI.2024.3429383","url":null,"abstract":"<p><p>Deep models, e.g., CNNs and Vision Transformers, have achieved impressive achievements in many vision tasks in the closed world. However, novel classes emerge from time to time in our ever-changing world, requiring a learning system to acquire new knowledge continually. Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally and build a universal classifier among all seen classes. Correspondingly, when directly training the model with new class instances, a fatal problem occurs - the model tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. There have been numerous efforts to tackle catastrophic forgetting in the machine learning community. In this paper, we survey comprehensively recent advances in class-incremental learning and summarize these methods from several aspects. We also provide a rigorous and unified evaluation of 17 methods in benchmark image classification tasks to find out the characteristics of different algorithms empirically. Furthermore, we notice that the current comparison protocol ignores the influence of memory budget in model storage, which may result in unfair comparison and biased results. Hence, we advocate fair comparison by aligning the memory budget in evaluation, as well as several memory-agnostic performance measures. The source code is available at https://github.com/zhoudw-zdw/CIL_Survey/.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling. 学习多对多映射,实现无配对真实世界图像的超分辨率和降维。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-16 DOI: 10.1109/TPAMI.2024.3428546
Wanjie Sun, Zhenzhong Chen
{"title":"Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling.","authors":"Wanjie Sun, Zhenzhong Chen","doi":"10.1109/TPAMI.2024.3428546","DOIUrl":"10.1109/TPAMI.2024.3428546","url":null,"abstract":"<p><p>Learning based single image super-resolution (SISR) for real-world images has been an active research topic yet a challenging task, due to the lack of paired low-resolution (LR) and high-resolution (HR) training images. Most of the existing unsupervised real-world SISR methods adopt a twostage training strategy by synthesizing realistic LR images from their HR counterparts first, then training the super-resolution (SR) models in a supervised manner. However, the training of image degradation and SR models in this strategy are separate, ignoring the inherent mutual dependency between downscaling and its inverse upscaling process. Additionally, the ill-posed nature of image degradation is not fully considered. In this paper, we propose an image downscaling and SR model dubbed as SDFlow, which simultaneously learns a bidirectional manyto- many mapping between real-world LR and HR images unsupervisedly. The main idea of SDFlow is to decouple image content and degradation information in the latent space, where content information distribution of LR and HR images is matched in a common latent space. Degradation information of the LR images and the high-frequency information of the HR images are fitted to an easy-to-sample conditional distribution. Experimental results on real-world image SR datasets indicate that SDFlow can generate diverse realistic LR and SR images both quantitatively and qualitatively.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Markov Progressive Framework, a Universal Paradigm for Modeling Long Videos. 马尔可夫渐进框架:长视频建模的通用范例
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-12 DOI: 10.1109/TPAMI.2024.3426998
Bo Pang, Gao Peng, Yizhuo Li, Cewu Lu
{"title":"Markov Progressive Framework, a Universal Paradigm for Modeling Long Videos.","authors":"Bo Pang, Gao Peng, Yizhuo Li, Cewu Lu","doi":"10.1109/TPAMI.2024.3426998","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3426998","url":null,"abstract":"<p><p>Compared to images, video, as an increasingly mainstream visual media, contains more semantic information. For this reason, the computational complexity of video models is an order of magnitude larger than their image-level counterparts, which increases linearly with the square number of frames. Constrained by computational resources, training video models to learn long-term temporal semantics end-to-end is quite a challenge. Currently, the main-stream method is to split a raw video into clips, leading to incomplete fragmentary temporal information flow and failure of modeling long-term semantics. To solve this problem, in this paper, we design the Markov Progressive framework (MaPro), a theoretical framework consisting of the progressive modeling method and a paradigm model tailored for it. Inspired by natural language processing techniques dealing with long sentences, the core idea of MaPro is to find a paradigm model consisting of proposed Markov operators which can be trained in multiple sequential steps and ensure that the multi-step progressive modeling is equivalent to the conventional end-to-end modeling. By training the paradigm model under the progressive method, we are able to model long videos end-to-end with limited resources and ensure the effective transmission of long-term temporal information. We provide detailed implementations of this theoretical system on the mainstream CNN- and Transformer-based models, where they are modified to conform to the Markov paradigm. The theoretical paradigm as a basic model is the lower bound of the model efficiency. With it, we further explore more sophisticated designs for CNN and Transformer-based methods specifically. As a general and robust training method, we experimentally demonstrate that it yields significant performance improvements on different backbones and datasets. As an illustrative example, the proposed method improves the SlowOnly network by 4.1 mAP on Charades and 2.5 top-1 accuracy on Kinetics. And for TimeSformer, MaPro improves its performance on Kinetics by 2.0 top-1 accuracy. Importantly, all these improvements are achieved with a little parameter and computation overhead. We hope the MaPro method can provide the community with new insight into modeling long videos.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信