IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction Via Spectrums. 另一个垂直视角:基于频谱的异构轨迹预测的层次网络。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-18 DOI: 10.1109/TPAMI.2025.3590487
Beihao Xia, Conghao Wong, Duanquan Xu, Qinmu Peng, Xinge You
{"title":"Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction Via Spectrums.","authors":"Beihao Xia, Conghao Wong, Duanquan Xu, Qinmu Peng, Xinge You","doi":"10.1109/TPAMI.2025.3590487","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3590487","url":null,"abstract":"<p><p>With the fast development of AI-related techniques, the applications of trajectory prediction are no longer limited to easier scenes and trajectories. More and more trajectories with different forms, such as coordinates, bounding boxes, and even high-dimensional human skeletons, need to be analyzed and forecasted. Among these heterogeneous trajectories, interactions between different elements within a frame of trajectory, which we call \"Dimension-wise Interactions\", would be more complex and challenging. However, most previous approaches focus mainly on a specific form of trajectories, and potential dimension-wise interactions are less concerned. In this work, we expand the trajectory prediction task by introducing the trajectory dimensionality $M$, thus extending its application scenarios to heterogeneous trajectories. We first introduce the Haar transform as an alternative to the Fourier transform to better capture the time-frequency properties of each trajectory-dimension. Then, we adopt the bilinear structure to model and fuse two factors simultaneously, including the time-frequency response and the dimension-wise interaction, to forecast heterogeneous trajectories via trajectory spectrums hierarchically in a generic way. Experiments show that the proposed model outperforms most state-of-the-art methods on ETH-UCY, SDD, nuScenes, and Human3.6M with heterogeneous trajectories, including 2D coordinates, 2D/3D bounding boxes, and 3D human skeletons.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144664131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPIC-SOUNDS: A Large-Scale Dataset of Actions That Sound. EPIC-SOUNDS:声音动作的大规模数据集。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-18 DOI: 10.1109/TPAMI.2025.3590390
Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman
{"title":"EPIC-SOUNDS: A Large-Scale Dataset of Actions That Sound.","authors":"Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman","doi":"10.1109/TPAMI.2025.3590390","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3590390","url":null,"abstract":"<p><p>We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from video, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4 k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2 k non-categorised segments. We train and evaluate state-of-the-art audio recognition and detection models on our dataset, for both audio-only and audio-visual methods. We also conduct analysis on: the temporal overlap between audio events, the temporal and label correlations between audio and visual modalities, the ambiguities in annotating materials from audio-only input, the importance of audio-only labels and the limitations of current models to understand actions that sound.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144664132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Re-Boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration. 基于自协作并行提示GAN的无监督图像恢复。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-16 DOI: 10.1109/TPAMI.2025.3589606
Xin Lin, Yuyan Zhou, Jingtong Yue, Chao Ren, Kelvin C K Chan, Lu Qi, Ming-Hsuan Yang
{"title":"Re-Boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration.","authors":"Xin Lin, Yuyan Zhou, Jingtong Yue, Chao Ren, Kelvin C K Chan, Lu Qi, Ming-Hsuan Yang","doi":"10.1109/TPAMI.2025.3589606","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3589606","url":null,"abstract":"<p><p>Deep learning methods have demonstrated state-of-the-art performance in image restoration, especially when trained on large-scale paired datasets. However, acquiring paired data in real-world scenarios poses a significant challenge. Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets. Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks without significantly modifying model structures or increasing the computational complexity. To address these issues, we propose a self-collaboration (SC) strategy for existing restoration models. This strategy utilizes information from the previous stage as feedback to guide subsequent stages, achieving significant performance improvement without increasing the framework's inference complexity. The SC strategy comprises a prompt learning (PL) module and a restorer ($Res$). It iteratively replaces the previous less powerful fixed restorer $overline{Res}$ in the PL module with a more powerful $Res$. The enhanced PL module generates better pseudo-degraded/clean image pairs, leading to a more powerful $Res$ for the next iteration. Our SC can significantly improve the $Res$ 's performance by over 1.5dB without adding extra parameters or computational complexity during inference. Meanwhile, existing self-ensemble (SE) and our SC strategies enhance the performance of pre-trained restorers from different perspectives. As SE increases computational complexity during inference, we propose a re-boosting module to the SC (Reb-SC) to improve the SC strategy further by incorporating SE into SC without increasing inference time. This approach further enhances the restorer's performance by approximately 0.3 dB. Additionally, we present a baseline framework that includes parallel generative adversarial branches with complementary \"self-synthesis\" and \"unpaired-synthesis\" constraints, ensuring the effectiveness of the training framework. Extensive experimental results on restoration tasks demonstrate that the proposed model performs favorably against existing state-of-the-art unsupervised restoration methods. Source code and trained models are publicly available at: https://github.com/linxin0/RSCP2GAN.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144651657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Random Feature Latent Variable Models. 可扩展随机特征潜变量模型。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-16 DOI: 10.1109/TPAMI.2025.3589728
Ying Li, Zhidi Lin, Yuhao Liu, Michael Minyi Zhang, Pablo M Olmos, Petar M Djuric
{"title":"Scalable Random Feature Latent Variable Models.","authors":"Ying Li, Zhidi Lin, Yuhao Liu, Michael Minyi Zhang, Pablo M Olmos, Petar M Djuric","doi":"10.1109/TPAMI.2025.3589728","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3589728","url":null,"abstract":"<p><p>Random feature latent variable models (RFLVMs) are state-of-the-art tools for uncovering structure in high-dimensional, non-Gaussian data. However, their reliance on Monte Carlo sampling significantly limits scalability, posing challenges for large-scale applications. To overcome these limitations, we develop a scalable RFLVM framework based on variational Bayesian inference (VBI), a deterministic and optimization-based alternative to sampling methods. Applying VBI to RFLVMs is nontrivial due to two key challenges: (i) the lack of an explicit probability density function (PDF) for Dirichlet process (DP) mixing weights, and (ii) the inefficiency of existing VBI approaches when handling the high-dimensional variational parameters of RFLVMs. To address these issues, we adopt the stick-breaking construction for the DP, which provides an explicit and tractable PDF over mixing weights, and propose a novel inference algorithm, block coordinate descent variational inference (BCD-VI), which partitions variational parameters into blocks and applies tailored solvers to optimize them efficiently. The resulting scalable model, referred to as SRFLVM, supports various likelihoods; we demonstrate its effectiveness under Gaussian and logistic settings. Extensive experiments on diverse benchmark datasets show that SRFLVM achieves superior scalability, computational efficiency, and performance in latent representation learning and missing data imputation, consistently outperforming state-of-the-art latent variable models, including deep generative approaches.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144651658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rapid Salient Object Detection With Difference Convolutional Neural Networks. 差分卷积神经网络快速显著目标检测。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-04 DOI: 10.1109/TPAMI.2025.3583968
Zhuo Su, Li Liu, Matthias Muller, Jiehua Zhang, Diana Wofk, Ming-Ming Cheng, Matti Pietikainen
{"title":"Rapid Salient Object Detection With Difference Convolutional Neural Networks.","authors":"Zhuo Su, Li Liu, Matthias Muller, Jiehua Zhang, Diana Wofk, Ming-Ming Cheng, Matti Pietikainen","doi":"10.1109/TPAMI.2025.3583968","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3583968","url":null,"abstract":"<p><p>This paper addresses the challenge of deploying salient object detection (SOD) on resource-constrained devices with real-time performance. While recent advances in deep neural networks have improved SOD, existing top-leading models are computationally expensive. We propose an efficient network design that combines traditional wisdom on SOD and the representation power of modern CNNs. Like biologically-inspired classical SOD methods relying on computing contrast cues to determine saliency of image regions, our model leverages Pixel Difference Convolutions (PDCs) to encode the feature contrasts. Differently, PDCs are incorporated in a CNN architecture so that the valuable contrast cues are extracted from rich feature maps. For efficiency, we introduce a difference convolution reparameterization (DCR) strategy that embeds PDCs into standard convolutions, eliminating computation and parameters at inference. Additionally, we introduce SpatioTemporal Difference Convolution (STDC) for video SOD, enhancing the standard 3D convolution with spatiotemporal contrast capture. Our models, SDNet for image SOD and STDNet for video SOD, achieve significant improvements in efficiency-accuracy trade-offs. On a Jetson Orin device, our models with $lt $ 1M parameters operate at 46 FPS and 150 FPS on streamed images and videos, surpassing the second-best lightweight models in our experiments by more than $2times$ and $3times$ in speed with superior accuracy.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNN2GNN: How to Bridge CNN with GNN. CNN2GNN:如何将CNN与GNN连接起来。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-03 DOI: 10.1109/TPAMI.2025.3583357
Ziheng Jiao, Hongyuan Zhang, Xuelong Li
{"title":"CNN2GNN: How to Bridge CNN with GNN.","authors":"Ziheng Jiao, Hongyuan Zhang, Xuelong Li","doi":"10.1109/TPAMI.2025.3583357","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3583357","url":null,"abstract":"<p><p>Thanks to extracting the intra-sample representation, the convolution neural network (CNN) has achieved excellent performance in vision tasks. However, its numerous convolutional layers take a higher training expense. Recently, graph neural networks (GNN), a bilinear model, have succeeded in exploring the underlying topological relationship among the graph data with a few graph neural layers. Unfortunately, due to the lack of graph structure and high-cost inference on large-scale scenarios, it cannot be directly utilized on non-graph data. Inspired by these complementary strengths and weaknesses, we discuss a natural question, how to bridge these two heterogeneous networks? In this paper, we propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. Firstly, to break the limitations of GNN, we design a differentiable sparse graph learning module as the head of the networks. It can dynamically learn the graph for inductive learning. Then, a response-based distillation is introduced to transfer the knowledge and bridge these two heterogeneous networks. Notably, due to extracting the intra-sample representation of a single instance and the topological relationship among the datasets simultaneously, the performance of the distilled \"boosted\" two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers, such as ResNet152.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Feature Matters: A Framework for Diffusion Model Quantization. 时间特征问题:扩散模型量化的框架。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-03 DOI: 10.1109/TPAMI.2025.3585692
Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao
{"title":"Temporal Feature Matters: A Framework for Diffusion Model Quantization.","authors":"Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, Dacheng Tao","doi":"10.1109/TPAMI.2025.3585692","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3585692","url":null,"abstract":"<p><p>Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues. However, unlike traditional models, diffusion models critically rely on the timestep for the multi-round denoising. Typically, each timestep is encoded into a hypersensitive temporal feature by several modules. Despite this, existing PTQ methods do not optimize these modules individually. Instead, they employ unsuitable reconstruction objectives and complex calibration methods, leading to significant disturbances in the temporal feature and denoising trajectory, as well as reduced compression efficiency. To address these challenges, we introduce a novel quantization framework that includes three strategies: 1) TIB-based Maintenance: Based on our innovative Temporal Information Block (TIB) definition, Temporal Information-aware Reconstruction (TIAR) and Finite Set Calibration (FSC) are developed to efficiently align original temporal features. 2) Cache-based Maintenance: Instead of indirect and complex optimization for the related modules, pre-computing and caching quantized counterparts of temporal features are developed to minimize errors. 3) Disturbance-aware Selection: Employ temporal feature errors to guide a fine-grained selection between the two maintenance strategies for further disturbance reduction. This framework preserves most of the temporal information and ensures high-quality end-to-end generation. Extensive testing on various datasets, diffusion models, and hardware confirms our superior performance and acceleration.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforced Embodied Active Defense: Exploiting Adaptive Interaction for Robust Visual Perception in Adversarial 3D Environments. 增强具身主动防御:利用自适应交互在对抗的三维环境稳健的视觉感知。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-03 DOI: 10.1109/TPAMI.2025.3585726
Xiao Yang, Lingxuan Wu, Lizhong Wang, Chengyang Ying, Hang Su, Jun Zhu
{"title":"Reinforced Embodied Active Defense: Exploiting Adaptive Interaction for Robust Visual Perception in Adversarial 3D Environments.","authors":"Xiao Yang, Lingxuan Wu, Lizhong Wang, Chengyang Ying, Hang Su, Jun Zhu","doi":"10.1109/TPAMI.2025.3585726","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3585726","url":null,"abstract":"<p><p>Adversarial attacks in 3D environments have emerged as a critical threat to the reliability of visual perception systems, particularly in safety-sensitive applications such as identity verification and autonomous driving. These attacks employ adversarial patches and 3D objects to manipulate deep neural network (DNN) predictions by exploiting vulnerabilities within complex scenes. Existing defense mechanisms, such as adversarial training and purification, primarily employ passive strategies to enhance robustness. However, these approaches often rely on pre-defined assumptions about adversarial tactics, limiting their adaptability in dynamic 3D settings. To address these challenges, we introduce Reinforced Embodied Active Defense (REIN-EAD), a proactive defense framework that leverages adaptive exploration and interaction with the environment to improve perception robustness in 3D adversarial contexts. By implementing a multi-step objective that balances immediate prediction accuracy with predictive entropy minimization, REIN-EAD optimizes defense strategies over a multi-step horizon. Additionally, REIN-EAD involves an uncertainty-oriented reward-shaping mechanism that facilitates efficient policy updates, thereby reducing computational overhead and supporting real-world applicability without the need for differentiable environments. Comprehensive experiments validate the effectiveness of REIN-EAD, demonstrating a substantial reduction in attack success rates while preserving standard accuracy across diverse tasks. Notably, REIN-EAD exhibits robust generalization to unseen and adaptive attacks, making it suitable for real-world complex tasks, including 3D object classification, face recognition and autonomous driving. By integrating proactive policy learning with embodied scene interaction, REIN-EAD establishes a scalable and adaptable approach for securing DNN-based perception systems in dynamic and adversarial 3D environments.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ComS2T: A Complementary Spatiotemporal Learning System for Data-Adaptive Model Evolution. ComS2T:一种数据自适应模型演化的互补时空学习系统。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-05 DOI: 10.1109/TPAMI.2025.3576805
Zhengyang Zhou, Qihe Huang, Binwu Wang, Jianpeng Hou, Kuo Yang, Yuxuan Liang, Yu Zheng, Yang Wang
{"title":"ComS2T: A Complementary Spatiotemporal Learning System for Data-Adaptive Model Evolution.","authors":"Zhengyang Zhou, Qihe Huang, Binwu Wang, Jianpeng Hou, Kuo Yang, Yuxuan Liang, Yu Zheng, Yang Wang","doi":"10.1109/TPAMI.2025.3576805","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3576805","url":null,"abstract":"<p><p>Spatiotemporal (ST) learning has become a crucial technique to enable smart cities and sustainable urban development. Current ST learning models capture the heterogeneity via various spatial convolution and temporal evolution blocks. However, rapid urbanization leads to fluctuating distributions in urban data and city structures, resulting in existing methods suffering generalization and data adaptation issues. Despite efforts, existing methods fail to deal with newly arrived observations, and the limitation of those methods with generalization capacity lies in the repeated training that leads to inconvenience, inefficiency and resource waste. Motivated by complementary learning in neuroscience, we introduce a prompt-based complementary spatiotemporal learning termed ComS2T, to empower the evolution of models for data adaptation. We first disentangle the neural architecture into two disjoint structures, a stable neocortex for consolidating historical memory, and a dynamic hippocampus for new knowledge update. Then we train the dynamic spatial and temporal prompts by characterizing distribution of main observations to enable prompts adaptive to new data. This data-adaptive prompt mechanism, combined with a two-stage training process, facilitates fine-tuning of the neural architecture conditioned on prompts, thereby enabling efficient adaptation during testing. Extensive experiments validate the efficacy of ComS2T in adapting various spatiotemporal out-of-distribution scenarios while maintaining effective inferences. The code is available on https://github.com/hqh0728/ComS2T.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144236287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial: Introduction to the Special Section on Large-Scale Multimodal Learning: Universality, Robustness, Efficiency, and Beyond 嘉宾评论:大规模多模态学习专题导论:普适性、稳健性、效率及超越
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-05 DOI: 10.1109/TPAMI.2025.3562938
Peng Xu;Song Bai;Bowen Zhou;David Clifton;Andrea Vedaldi;Mihaela van der Schaar;Luc Van Gool
{"title":"Guest Editorial: Introduction to the Special Section on Large-Scale Multimodal Learning: Universality, Robustness, Efficiency, and Beyond","authors":"Peng Xu;Song Bai;Bowen Zhou;David Clifton;Andrea Vedaldi;Mihaela van der Schaar;Luc Van Gool","doi":"10.1109/TPAMI.2025.3562938","DOIUrl":"https://doi.org/10.1109/TPAMI.2025.3562938","url":null,"abstract":"","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5127-5129"},"PeriodicalIF":0.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11026038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144219785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信