IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

筛选
英文 中文
Cap4Video++: Enhancing Video Understanding with Auxiliary Captions Cap4Video++:通过辅助字幕增强对视频的理解
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2024-09-09 DOI: 10.1109/tpami.2024.3410329
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
{"title":"Cap4Video++: Enhancing Video Understanding with Auxiliary Captions","authors":"Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang","doi":"10.1109/tpami.2024.3410329","DOIUrl":"https://doi.org/10.1109/tpami.2024.3410329","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"48 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142160559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Versatile Framework for Multi-Scene Person Re-Identification 多场景人员再识别多功能框架
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2024-04-29 DOI: 10.1109/tpami.2024.3381184
Wei-Shi Zheng, Junkai Yan, Yi-Xing Peng
{"title":"A Versatile Framework for Multi-Scene Person Re-Identification","authors":"Wei-Shi Zheng, Junkai Yan, Yi-Xing Peng","doi":"10.1109/tpami.2024.3381184","DOIUrl":"https://doi.org/10.1109/tpami.2024.3381184","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"23 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140814352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistency-Aware Anchor Pyramid Network for Crowd Localization 用于人群定位的一致性感知锚点金字塔网络
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2024-04-29 DOI: 10.1109/tpami.2024.3392013
Xinyan Liu, Guorong Li, Yuankai Qi, Zhenjun Han, Anton van den Hengel, Nicu Sebe, Ming-Hsuan Yang, Qingming Huang
{"title":"Consistency-Aware Anchor Pyramid Network for Crowd Localization","authors":"Xinyan Liu, Guorong Li, Yuankai Qi, Zhenjun Han, Anton van den Hengel, Nicu Sebe, Ming-Hsuan Yang, Qingming Huang","doi":"10.1109/tpami.2024.3392013","DOIUrl":"https://doi.org/10.1109/tpami.2024.3392013","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"155 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140814751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments ETPNav:连续环境中视觉语言导航的演进拓扑规划
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2024-04-09 DOI: 10.1109/tpami.2024.3386695
Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang
{"title":"ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments","authors":"Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang","doi":"10.1109/tpami.2024.3386695","DOIUrl":"https://doi.org/10.1109/tpami.2024.3386695","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"29 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140541270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation NineRec:用于评估可转移推荐的基准数据集套件
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2024-03-08 DOI: 10.1109/tpami.2024.3373868
Jiaqi Zhang, Yu Cheng, Yongxin Ni, Yunzhu Pan, Zheng Yuan, Junchen Fu, Youhua Li, Jie Wang, Fajie Yuan
{"title":"NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation","authors":"Jiaqi Zhang, Yu Cheng, Yongxin Ni, Yunzhu Pan, Zheng Yuan, Junchen Fu, Youhua Li, Jie Wang, Fajie Yuan","doi":"10.1109/tpami.2024.3373868","DOIUrl":"https://doi.org/10.1109/tpami.2024.3373868","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"25 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140067648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introspective Deep Metric Learning 反思性深度度量学习
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2023-09-05 DOI: 10.48550/arXiv.2205.04449
Cheng-Hao Wang, Wenzhao Zheng, Zheng Hua Zhu, Jie Zhou, Jiwen Lu
{"title":"Introspective Deep Metric Learning","authors":"Cheng-Hao Wang, Wenzhao Zheng, Zheng Hua Zhu, Jie Zhou, Jiwen Lu","doi":"10.48550/arXiv.2205.04449","DOIUrl":"https://doi.org/10.48550/arXiv.2205.04449","url":null,"abstract":"This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce overconfident judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. Our framework attains state-of-the-art performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval. We further evaluate our framework for image classification on the ImageNet-1K, CIFAR-10, and CIFAR-100 datasets, which shows that equipping existing data mixing methods with the proposed introspective metric consistently achieves better results (e.g., +0.44% for CutMix on ImageNet-1K).","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46967733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning to Solve Hard Minimal Problems. 学会解决最简单的难题
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2023-08-23 DOI: 10.1109/TPAMI.2023.3307898
Petr Hruby, Timothy Duff, Anton Leykin, Tomas Pajdla
{"title":"Learning to Solve Hard Minimal Problems.","authors":"Petr Hruby, Timothy Duff, Anton Leykin, Tomas Pajdla","doi":"10.1109/TPAMI.2023.3307898","DOIUrl":"10.1109/TPAMI.2023.3307898","url":null,"abstract":"<p><p>We present an approach to solving hard geometric optimization problems in the RANSAC framework. The hard minimal problems arise from relaxing the original geometric optimization problem into a minimal problem with many spurious solutions. Our approach avoids computing large numbers of spurious solutions. We design a learning strategy for selecting a starting problem-solution pair that can be numerically continued to the problem and the solution of interest. We demonstrate our approach by developing a RANSAC solver for the problem of computing the relative pose of three calibrated cameras, via a minimal relaxation using four points in each view. On average, we can solve a single problem in under 70 μs. We also benchmark and study our engineering choices on the very familiar problem of computing the relative pose of two calibrated cameras, via the minimal case of five points in two views.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10055768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coarse-to-Fine Multi-Scene Pose Regression with Transformers 使用Transformers进行粗到细多场景姿势回归
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2023-08-22 DOI: 10.48550/arXiv.2308.11783
Yoli Shavit, Ron Ferens, Y. Keller
{"title":"Coarse-to-Fine Multi-Scene Pose Regression with Transformers","authors":"Yoli Shavit, Ron Ferens, Y. Keller","doi":"10.48550/arXiv.2308.11783","DOIUrl":"https://doi.org/10.48550/arXiv.2308.11783","url":null,"abstract":"Absolute camera pose regressors estimate the posi-tion and orientation of a camera given the captured image alone. Typically, a convolutional backbone with a multi-layer perceptron (MLP) head is trained using images and pose labels to embed a single reference scene at a time. Recently, this scheme was extended to learn multiple scenes by replacing the MLP head with a set of fully connected layers. In this work, we propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention and decoders transform latent features and scenes encoding into pose predictions. This allows our model to focus on general features that are informative for localization, while em-bedding multiple scenes in parallel. We extend our previous MS-Transformer approach [1] by introducing a mixed classification-regression architecture that improves the localization accuracy. Our method is evaluated on commonly benchmark indoor and outdoor datasets and has been shown to exceed both multi-scene and state-of-the-art single-scene absolute pose regressors. We make our code publicly available from here.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48341276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoIR: Compressive Implicit Radar. CoIR:压缩隐含雷达。
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2023-08-10 DOI: 10.1109/TPAMI.2023.3301553
Sean M Farrell, Vivek Boominathan, Nathaniel Raymondi, Ashutosh Sabharwal, Ashok Veeraraghavan
{"title":"CoIR: Compressive Implicit Radar.","authors":"Sean M Farrell, Vivek Boominathan, Nathaniel Raymondi, Ashutosh Sabharwal, Ashok Veeraraghavan","doi":"10.1109/TPAMI.2023.3301553","DOIUrl":"10.1109/TPAMI.2023.3301553","url":null,"abstract":"<p><p>Using millimeter wave (mmWave) signals for imaging has an important advantage in that they can penetrate through poor environmental conditions such as fog, dust, and smoke that severely degrade optical-based imaging systems. However, mmWave radars, contrary to cameras and LiDARs, suffer from low angular resolution because of small physical apertures and conventional signal processing techniques. Sparse radar imaging, on the other hand, can increase the aperture size while minimizing the power consumption and read out bandwidth. This paper presents CoIR, an analysis by synthesis method that leverages the implicit neural network bias in convolutional decoders and compressed sensing to perform high accuracy sparse radar imaging. The proposed system is data set-agnostic and does not require any auxiliary sensors for training or testing. We introduce a sparse array design that allows for a 5.5× reduction in the number of antenna elements needed compared to conventional MIMO array designs. We demonstrate our system's improved imaging performance over standard mmWave radars and other competitive untrained methods on both simulated and experimental mmWave radar data.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9971344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-Empowered Invariant Grounding for Video Question Answering. 用于视频问题解答的变压器供电不变接地。
IF 23.6 1区 计算机科学
IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2023-08-09 DOI: 10.1109/TPAMI.2023.3303451
Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua
{"title":"Transformer-Empowered Invariant Grounding for Video Question Answering.","authors":"Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua","doi":"10.1109/TPAMI.2023.3303451","DOIUrl":"10.1109/TPAMI.2023.3303451","url":null,"abstract":"<p><p>Video Question Answering (VideoQA) is the task of answering questions about a video. At its core is the understanding of the alignments between video scenes and question semantics to yield the answer. In leading VideoQA models, the typical learning objective, empirical risk minimization (ERM), tends to over-exploit the spurious correlations between question-irrelevant scenes and answers, instead of inspecting the causal effect of question-critical scenes, which undermines the prediction with unreliable reasoning. In this work, we take a causal look at VideoQA and propose a modal-agnostic learning framework, named Invariant Grounding for VideoQA (IGV), to ground the question-critical scene, whose causal relations with answers are invariant across different interventions on the complement. With IGV, leading VideoQA models are forced to shield the answering from the negative influence of spurious correlations, which significantly improves their reasoning ability. To unleash the potential of this framework, we further provide a Transformer-Empowered Invariant Grounding for VideoQA (TIGV), a substantial instantiation of IGV framework that naturally integrates the idea of invariant grounding into a transformer-style backbone. Experiments on four benchmark datasets validate our design in terms of accuracy, visual explainability, and generalization ability over the leading baselines. Our code is available at https://github.com/yl3800/TIGV.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9968656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信