Proceedings of the 2022 International Conference on Multimedia Retrieval最新文献_第9页

Efficient Linear Attention for Fast and Accurate Keypoint Matching 有效的线性关注快速和准确的关键点匹配

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-04-16 DOI: 10.1145/3512527.3531369

Suwichaya Suwanwimolkul, S. Komorita

引用次数: 10

OSCARS: An Outlier-Sensitive Content-Based Radiography Retrieval System 奥斯卡:一个异常值敏感的基于内容的放射图像检索系统

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-04-06 DOI: 10.1145/3512527.3531425

Xiaoyuan Guo, Jiali Duan, S. Purkayastha, H. Trivedi, J. Gichoya, I. Banerjee

{"title":"OSCARS: An Outlier-Sensitive Content-Based Radiography Retrieval System","authors":"Xiaoyuan Guo, Jiali Duan, S. Purkayastha, H. Trivedi, J. Gichoya, I. Banerjee","doi":"10.1145/3512527.3531425","DOIUrl":"https://doi.org/10.1145/3512527.3531425","url":null,"abstract":"Improving the retrieval relevance on noisy datasets is an emerging need for the curation of a large-scale clean dataset in the medical domain. While existing methods can be applied for class-wise retrieval (aka. inter-class), they cannot distinguish the granularity of likeness within the same class (aka. intra-class). The problem is exacerbated on medical external datasets, where noisy samples of the same class are treated equally during training. Our goal is to identify both intra/inter-class similarities for fine-grained retrieval. To achieve this, we propose an Outlier-Sensitive Content-based rAdiologhy Retrieval System (OSCARS), consisting of two steps. First, we train an outlier detector on a clean internal dataset in an unsupervised manner. Then we use the trained detector to generate the anomaly scores on the external dataset, whose distribution will be used to bin intra-class variations. Second, we propose a quadruplet (a, p, nintra, ninter) sampling strategy, where intra-class negatives nintra are sampled from bins of the same class other than the bin anchor a belongs to, while n_inter are randomly sampled from inter-classes. We suggest a weighted metric learning objective to balance the intra and inter-class feature learning. We experimented on two representative public radiography datasets. Experiments show the effectiveness of our approach. The training and evaluation code can be found in https://github.com/XiaoyuanGuo/oscars.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131075767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Sample Importance for Cross-Scenario Video Temporal Grounding 学习样本对跨场景视频时间接地的重要性

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-01-08 DOI: 10.1145/3512527.3531403

P. Bao, Yadong Mu

{"title":"Learning Sample Importance for Cross-Scenario Video Temporal Grounding","authors":"P. Bao, Yadong Mu","doi":"10.1145/3512527.3531403","DOIUrl":"https://doi.org/10.1145/3512527.3531403","url":null,"abstract":"The task of temporal grounding aims to locate video moment in an untrimmed video, with a given sentence query. This paper for the first time investigates some superficial biases that are specific to the temporal grounding task, and proposes a novel targeted solution. Most alarmingly, we observe that existing temporal ground models heavily rely on some biases (e.g., high preference on frequent concepts or certain temporal intervals) in the visual modal. This leads to inferior performance when generalizing the model in cross-scenario test setting. To this end, we propose a novel method called Debiased Temporal Language Localizer (Debias-TLL) to prevent the model from naively memorizing the biases and enforce it to ground the query sentence based on true inter-modal relationship. Debias-TLL simultaneously trains two models. By our design, a large discrepancy of these two models' predictions when judging a sample reveals higher probability of being a biased sample. Harnessing the informative discrepancy, we devise a data re-weighing scheme for mitigating the data biases. We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced. Experiments show large-margin superiority of the proposed method in comparison with state-of-the-art competitors.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133406211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Nearest Neighbor Search with Compact Codes: A Decoder Perspective 压缩码的最近邻搜索:解码器视角

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2021-12-17 DOI: 10.1145/3512527.3531408

Kenza Amara, Matthijs Douze, Alexandre Sablayrolles, Herv'e J'egou

引用次数: 2

Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval 构建短语级语义标签，形成多粒度的图像-文本检索监督

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2021-09-12 DOI: 10.1145/3512527.3531368

Zhihao Fan, Zhongyu Wei, Zejun Li, Siyuan Wang, Haijun Shan, Xuanjing Huang, Jianqing Fan

{"title":"Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval","authors":"Zhihao Fan, Zhongyu Wei, Zejun Li, Siyuan Wang, Haijun Shan, Xuanjing Huang, Jianqing Fan","doi":"10.1145/3512527.3531368","DOIUrl":"https://doi.org/10.1145/3512527.3531368","url":null,"abstract":"Existing research for image text retrieval mainly relies on sentence-level supervision to distinguish matched and mismatched sentences for a query image. However, semantic mismatch between an image and sentences usually happens in finer grain, i.e., phrase level. In this paper, we explore to introduce additional phrase-level supervision for the better identification of mismatched units in the text. In practice, multi-grained semantic labels are automatically constructed for a query image in both sentence-level and phrase-level. We construct text scene graphs for the matched sentences and extract entities and triples as the phrase-level labels. In order to integrate both supervision of sentence-level and phrase-level, we propose Semantic Structure Aware Multimodal Transformer (SSAMT) for multi-modal representation learning. Inside the SSAMT, we utilize different kinds of attention mechanisms to enforce interactions of multi-grained semantic units in both sides of vision and language. For the training, we propose multi-scale matching from both global and local perspectives, and penalize mismatched phrases. Experimental results on MS-COCO and Flickr30K show the effectiveness of our approach compared to some state-of-the-art models.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114882520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval TransHash:基于变换的汉明哈希高效图像检索

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2021-05-05 DOI: 10.1145/3512527.3531405

Yongbiao Chen, Shenmin Zhang, Fangxin Liu, Zhigang Chang, Mang Ye, Zhengwei Qi Shanghai Jiao Tong University, U. California, W. University

{"title":"TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval","authors":"Yongbiao Chen, Shenmin Zhang, Fangxin Liu, Zhigang Chang, Mang Ye, Zhengwei Qi Shanghai Jiao Tong University, U. California, W. University","doi":"10.1145/3512527.3531405","DOIUrl":"https://doi.org/10.1145/3512527.3531405","url":null,"abstract":"Deep hashing has gained growing popularity in approximate nearest neighbor search for large-scale image retrieval. Until now, the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures, e.g. Resnet [22]. In this paper, inspired by the recent advancements of vision transformers, we present Transhash, a pure transformer-based framework for deep hashing learning. Concretely, our framework is composed of two major modules: (1) Based onVision Transformer (ViT), we design a siamese Multi-Granular Vision Tansformer backbone (MGVT) for image feature extraction. To learn fine-grained features, we innovate a dual-stream multi-granular feature learning on top of the transformer to learn discriminative global and local features. (2) Besides, we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to learn compact binary hash codes. The entire framework is jointly trained in an end-to-end manner. To the best of our knowledge, this is the first work to tackle deep hashing learning problems without convolutional neural networks (CNNs). We perform comprehensive experiments on three widely-studied datasets: CIFAR-10, NUSWIDE and IMAGENET. The experiments have evidenced our superiority against the existing state-of-the-art deep hashing methods. Specifically, we achieve 8.2%, 2.6%, 12.7% performance gains in terms of average mAP for different hash bit lengths on three public datasets, respectively.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123748692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection M2TR:用于深度伪造检测的多模态多尺度变压器

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2021-04-20 DOI: 10.1145/3512527.3531415

Junke Wang, Zuxuan Wu, Jingjing Chen, Yu-Gang Jiang

引用次数: 110

Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification 用于车辆再识别的可插拔弱监督交叉视图学习

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2021-03-09 DOI: 10.1145/3512527.3531357

Lu Yang, Hongbang Liu, Jinghao Zhou, Lingqiao Liu, Lei Zhang, Peng Wang, Yanning Zhang

{"title":"Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification","authors":"Lu Yang, Hongbang Liu, Jinghao Zhou, Lingqiao Liu, Lei Zhang, Peng Wang, Yanning Zhang","doi":"10.1145/3512527.3531357","DOIUrl":"https://doi.org/10.1145/3512527.3531357","url":null,"abstract":"Learning cross-view consistent feature representation is the key for accurate vehicle Re-identification (ReID), since the visual appearance of vehicles changes significantly under different viewpoints. To this end, many existing approaches resort to the supervised cross-view learning using extensive extra viewpoints annotations, which however, is difficult to deploy in real applications due to the expensive labelling cost and the continous viewpoint variation that makes it hard to define discrete viewpoint labels. In this study, we present a pluggable Weakly-supervised Cross-View Learning (WCVL) module for vehicle ReID. Through hallucinating the cross-view samples as the hardest positive counterparts with small luminance difference and large local feature variance, we can learn the consistent feature representation via minimizing the cross-view feature distance based on vehicle IDs only without using any viewpoint annotation. More importantly, the proposed method can be seamlessly plugged into most existing vehicle ReID baselines for cross-view learning without re-training the baselines. To demonstrate its efficacy, we plug the proposed method into a bunch of off-the-shelf baselines and obtain significant performance improvement on four public benchmark datasets, i.e., VeRi-776, VehicleID, VRIC and VRAI.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124500983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proceedings of the 2022 International Conference on Multimedia Retrieval 2022年多媒体检索国际会议论文集

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 1900-01-01 DOI: 10.1145/3512527

引用次数: 0