2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献

筛选
英文 中文
Adaptive and Robust Fourier-Mellin-Based Image Watermarking for Social Networking Platforms 基于fourier - mellin的社交网络平台自适应鲁棒图像水印
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00483
Jinghong Xia, Hongxia Wang, S. Abdullahi, Heng Wang, Fei Zhang, Bingling Luo
{"title":"Adaptive and Robust Fourier-Mellin-Based Image Watermarking for Social Networking Platforms","authors":"Jinghong Xia, Hongxia Wang, S. Abdullahi, Heng Wang, Fei Zhang, Bingling Luo","doi":"10.1109/ICME55011.2023.00483","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00483","url":null,"abstract":"According to the Buckets effect, the capacity of a bucket depends on the length of the shortest board. This principle also applies to social networking platform resilient (SNPR) image watermarking, which should be comprehensive and free from significant shortcomings. In the frequency domain, the watermarked region is formed using log-polar coordinate mapping (LPM) and has a ring-like structure. However, this structure cannot be stretched or compressed, and it causes a streaking effect at the edges of the watermarked image. These issues have been addressed in the proposed method. Specifically, an adaptive optimization framework is used to adjust the embedding strength and range of the watermark, and multiple synchronization strategies are adopted to correct flip and aspect ratio. Compared with state-of-the-art works, the proposed method significantly improves the imperceptibility of the watermarked image and its robustness to various distortions and lossy transmission on social networking platforms (SNPs).","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121129958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Query-Adaptive Convolution for Generalizable Person Re-Identification 基于多尺度查询自适应卷积的可泛化人物再识别
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00411
Kaixiang Chen, T. Gong, Liyan Zhang
{"title":"Multi-Scale Query-Adaptive Convolution for Generalizable Person Re-Identification","authors":"Kaixiang Chen, T. Gong, Liyan Zhang","doi":"10.1109/ICME55011.2023.00411","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00411","url":null,"abstract":"Domain Generalization in person re-identification (ReID) aims to learn a generalizable model from a single or multi-source domain that can be directly deployed to an unseen domain without fine-tuning. In this paper, we investigate the problem of single-source domain generalization in ReID. Recent research has gained remarkable progress by treating image matching as a search for local correspondences in feature maps. However, to ensure efficient matching, they usually adopt a pixel-wise matching approach, which is prone to be deviated by the identity-irrelevant patch features in the image, such as background patches. To address this problem, we propose the Multi-scale Query-Adaptive Convolution (QAConv-MS) framework. Specifically, we adopt a group of template kernels with different scales to extract local features of different receptive fields from the original feature maps and accordingly perform the local matching process. We also introduce a self-attention branch to extract global features from the feature map as complementary information for local features. Our approach achieves state-of-the-art performances on four large-scale datasets.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116159845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Attention Learning for Multimodal Classification 多模态分类的层次注意学习
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00165
Xin Zou, Chang Tang, Wei Zhang, Kun Sun, Liangxiao Jiang
{"title":"Hierarchical Attention Learning for Multimodal Classification","authors":"Xin Zou, Chang Tang, Wei Zhang, Kun Sun, Liangxiao Jiang","doi":"10.1109/ICME55011.2023.00165","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00165","url":null,"abstract":"Multimodal learning aims to integrate complementary information from different modalities for more reliable decisions. However, existing multimodal classification methods simply integrate the learned local features, which ignore the underlying structure of each modality and the higher-order correlation across modalities. In this paper, we propose a novel Hierarchical Attention Learning Network (HALNet) for multimodal classification. Specifically, HALNet has three merits: 1) A hierarchical feature fusion module is proposed to learn multilevel features, aggregating multi-level features for a global feature representation with the attention mechanism and progressive fusion tactics. 2) A cross-modal higher-order fusion module is introduced to capture the prospective cross-modal correlations at label space. 3) A dual prediction pattern is designed to generate credible decisions. Extensive experiments on three real-world multimodal datasets demonstrate that HALNet achieves competitive performance compared to the state-of-the-art.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125022544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-To-End Part-Level Action Parsing With Transformer 端到端的部分级动作解析与变压器
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00135
Xiaojia Chen, Xuanhan Wang, Beitao Chen, Lianli Gao
{"title":"End-To-End Part-Level Action Parsing With Transformer","authors":"Xiaojia Chen, Xuanhan Wang, Beitao Chen, Lianli Gao","doi":"10.1109/ICME55011.2023.00135","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00135","url":null,"abstract":"The divide-and-conquer strategy, which interprets part-level action parsing as a detect-then-parsing pipeline, has been widely used and become a general tool for part-level action understanding. However, existing methods that derive from the strategy usually suffer from either strong dependence on prior detection or high computational complexity. In this paper, we present the first fully end-to-end part-level action parsing framework with transformers, termed PATR. Unlike existing methods, our method regards part-level action parsing as a hierarchical set prediction problem and unifies person detection, body part detection, and action state recognition into one model. In PATR, predefined learnable representations, including general instance representations and general part representations, are guided to adaptively attend to the image features that are relevant to target body parts. Then, conditioning on corresponding learnable representations, attended image features are hierarchically decoded into corresponding semantics (i.e., person location, body part location, and action states for each body part). In this way, PATR relies on characteristics of body parts, instead of prior predictions like bounding boxes, to parse action states, thus removing the strong dependence between sub-tasks and eliminating the computational burdens caused by the multi-stage paradigm. Extensive experiments conducted on challenging Kinetic-TPS indicate that our method achieves very competitive results. In particular, our model outperforms all state-of-the-art part-level action parsing approaches by a margin, reaching around 3.8±2.0% Accp higher than previous methods. These findings indicate the potential of PATR to serve as a new baseline for part-level action parsing methods in the future. Our code and models are publicly available. 1","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125321253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EvenFace: Deep Face Recognition with Uniform Distribution of Identities EvenFace:具有均匀身份分布的深度人脸识别
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00298
Pengfei Hu, Y. Tao, Qiqi Bao, Guijin Wang, Wenming Yang
{"title":"EvenFace: Deep Face Recognition with Uniform Distribution of Identities","authors":"Pengfei Hu, Y. Tao, Qiqi Bao, Guijin Wang, Wenming Yang","doi":"10.1109/ICME55011.2023.00298","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00298","url":null,"abstract":"The development of loss functions over the past few years has brought great success to face recognition. Most algorithms focus on improving the intra-class compactness of face features but ignore the inter-class separability. In this paper, we propose a method named EvenFace, which introduces a regularization variance item and a mean term of inter-class separability to further promote the even distribution of class centers on the hypersphere, thereby increasing the inter-class distance. In order to evaluate the inter-class separability, a new index is proposed to better reflect the distribution of class centers and guide the classification. By penalizing the angle between each identity and its surrounding neighbors, the resulting uniform distribution of identities enables full exploitation of the feature space, leading to discriminative face representations. Our proposed loss function can effectively boost the performance of softmax loss variants. Quantitative comparisons with other state-of-the-art methods on several benchmarks demonstrate the superiority of EvenFace.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122864674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive-Masking Policy with Deep Reinforcement Learning for Self-Supervised Medical Image Segmentation 基于深度强化学习的自监督医学图像分割自适应掩蔽策略
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00390
Gang Xu, Shengxin Wang, Thomas Lukasiewicz, Zhenghua Xu
{"title":"Adaptive-Masking Policy with Deep Reinforcement Learning for Self-Supervised Medical Image Segmentation","authors":"Gang Xu, Shengxin Wang, Thomas Lukasiewicz, Zhenghua Xu","doi":"10.1109/ICME55011.2023.00390","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00390","url":null,"abstract":"Although self-supervised learning methods based on masked image modeling have achieved some success in improving the performance of deep learning models, these methods have difficulty in ensuring that the masked region is the most appropriate for each image, resulting in segmentation networks that do not get the best weights in pre-training. Therefore, we propose a new adaptive-masking policy self-supervised learning method. Specifically, we model the process of masking images as a reinforcement learning problem and use the results of the reconstruction model as a feedback signal to guide the agent to learn the masking policy to select a more appropriate mask position and size for each image, helping the reconstruction network to learn more fine-grained image representation information and thus improve the downstream segmentation model performance. We conduct extensive experiments on two datasets, Cardiac and TCIA, and the results show that our approach outperforms current state-of-the-art self-supervised learning methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131602502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trajectory Alignment based Multi-Scaled Temporal Attention for Efficient Video Transformer 基于多尺度时间关注的高效视频变压器轨迹对准
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00244
Zao Zhang, Dong Yuan, Yu Zhang, Wei Bao
{"title":"Trajectory Alignment based Multi-Scaled Temporal Attention for Efficient Video Transformer","authors":"Zao Zhang, Dong Yuan, Yu Zhang, Wei Bao","doi":"10.1109/ICME55011.2023.00244","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00244","url":null,"abstract":"Although the video transformer gets remarkable accuracy on video recognition tasks, it is hard to be deployed in resource-constrained scenarios due to the high computational cost. A method that dynamically modifies and trains the transformer model, ensuring that the computational cost matches the deployment scenario requirement, would be an effective solution to this challenge. In this paper, we propose a method for modifying large-scale video transformers with trajectory alignment based multi-scaled temporal attention (TAMS) schemes to reduce the computational cost significantly while losing accuracy slightly. In the temporal dimension, we adopt multi-scaled sparsity patterns in hierarchical transformer blocks. In the spatial dimension, we use region selection to force the transformer to focus on high-importance regions while not corrupting the spatial context. Our method reduces up to 40% computational cost of state-of-the-art large-scale video transformers with a slight accuracy drop (~ 7%) on the video recognition task.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Snapshot Compressive Imaging via Optical Flow 基于光流的视频快照压缩成像
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00372
Zan Chen, Ran Li, Yongqiang Li, Yuanjing Feng
{"title":"Video Snapshot Compressive Imaging via Optical Flow","authors":"Zan Chen, Ran Li, Yongqiang Li, Yuanjing Feng","doi":"10.1109/ICME55011.2023.00372","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00372","url":null,"abstract":"Video Snapshot compressive imaging (SCI) reconstruction recovers video frames from a compressed 2D measurement. However, frames at each time cannot be observed since the limitation of hardware. To make SCI suitable for more applications, we propose an optical flow-based deep unfolding network for video SCI reconstruction. To extract the optical flow, the feature maps during the iterative process are transformed by the convolution layer into the estimated optical flow. We designed a motion regularizer, which uses voxels of iterative frames and optical flow to update the reconstructed frames. The proposed motion regularizer efficiently captures the temporal correlation between the previous and next frames, which contributes to reconstructing the observed and unobserved frames from input measurement in a SCI reconstruction process. Experiments show that our method achieves state-of-the-art results on PSNR and SSIM.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132934285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Few-Shot Object Detection via Back Propagation and Dynamic Learning 基于反向传播和动态学习的少镜头目标检测
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00493
Dianlong You, P. Wang, Y. Zhang, Ling Wang, Shunfu Jin
{"title":"Few-Shot Object Detection via Back Propagation and Dynamic Learning","authors":"Dianlong You, P. Wang, Y. Zhang, Ling Wang, Shunfu Jin","doi":"10.1109/ICME55011.2023.00493","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00493","url":null,"abstract":"Utilizing traditional object detectors to build a few-shot object detection (FSOD) model ignores the differences between classification and regression tasks and causes task conflict and class confusion, resulting in a decline in classification performance. In contrast, this paper focuses on the above shortcomings and utilizes the strategies of Back Propagation and Dynamic Learning to construct a model for addressing FSOD, named BPDL. Our BPDL has a two-fold main idea: a) it uses the optimized localization boxes to alleviate the task conflict and refine classification features by a correction loss, and b) it develops a dynamic learning strategy to filter the confusing features and mine more realistic prototype representations of the categories to calibrate classification. Extensive experiments on multiple benchmarks show that our BPDL model outperforms existing methods and advances the FSOD task’s state-of-the-art.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"31 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133205381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Attention Prediction Correction with Channel Suppression for Weakly-Supervised Semantic Segmentation 基于信道抑制的弱监督语义分割自注意预测校正
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00150
Guoying Sun, Meng Yang
{"title":"Self-Attention Prediction Correction with Channel Suppression for Weakly-Supervised Semantic Segmentation","authors":"Guoying Sun, Meng Yang","doi":"10.1109/ICME55011.2023.00150","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00150","url":null,"abstract":"Single-stage weakly-supervised semantic segmentation (WSSS) with image-level labels has become a new research hotspot in the community for its lower cost and higher training efficiency. However, the pseudo label of WSSS generally suffers from somewhat noise, which limits the segmentation performance. In this paper, to explore the integral foreground activation, we propose the Channel Suppression (CS) module for preventing only activating the most discriminative regions, thereby improving the initial pseudo labels. To rectify the in-correct prediction, we explore the Self-Attention Prediction Correction (SAPC) module, which adaptively generates the category-wise prediction rectification weights. After extensive experiments, the proposed efficient single-stage framework achieves excellent performance with 67.6% mIoU and 39.9% mIoU on PASCAL VOC 2012 and MS COCO 2014 datasets, significantly exceeding several recent single-stage methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133272667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信