Jinghong Xia, Hongxia Wang, S. Abdullahi, Heng Wang, Fei Zhang, Bingling Luo
{"title":"Adaptive and Robust Fourier-Mellin-Based Image Watermarking for Social Networking Platforms","authors":"Jinghong Xia, Hongxia Wang, S. Abdullahi, Heng Wang, Fei Zhang, Bingling Luo","doi":"10.1109/ICME55011.2023.00483","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00483","url":null,"abstract":"According to the Buckets effect, the capacity of a bucket depends on the length of the shortest board. This principle also applies to social networking platform resilient (SNPR) image watermarking, which should be comprehensive and free from significant shortcomings. In the frequency domain, the watermarked region is formed using log-polar coordinate mapping (LPM) and has a ring-like structure. However, this structure cannot be stretched or compressed, and it causes a streaking effect at the edges of the watermarked image. These issues have been addressed in the proposed method. Specifically, an adaptive optimization framework is used to adjust the embedding strength and range of the watermark, and multiple synchronization strategies are adopted to correct flip and aspect ratio. Compared with state-of-the-art works, the proposed method significantly improves the imperceptibility of the watermarked image and its robustness to various distortions and lossy transmission on social networking platforms (SNPs).","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121129958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Scale Query-Adaptive Convolution for Generalizable Person Re-Identification","authors":"Kaixiang Chen, T. Gong, Liyan Zhang","doi":"10.1109/ICME55011.2023.00411","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00411","url":null,"abstract":"Domain Generalization in person re-identification (ReID) aims to learn a generalizable model from a single or multi-source domain that can be directly deployed to an unseen domain without fine-tuning. In this paper, we investigate the problem of single-source domain generalization in ReID. Recent research has gained remarkable progress by treating image matching as a search for local correspondences in feature maps. However, to ensure efficient matching, they usually adopt a pixel-wise matching approach, which is prone to be deviated by the identity-irrelevant patch features in the image, such as background patches. To address this problem, we propose the Multi-scale Query-Adaptive Convolution (QAConv-MS) framework. Specifically, we adopt a group of template kernels with different scales to extract local features of different receptive fields from the original feature maps and accordingly perform the local matching process. We also introduce a self-attention branch to extract global features from the feature map as complementary information for local features. Our approach achieves state-of-the-art performances on four large-scale datasets.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116159845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Zou, Chang Tang, Wei Zhang, Kun Sun, Liangxiao Jiang
{"title":"Hierarchical Attention Learning for Multimodal Classification","authors":"Xin Zou, Chang Tang, Wei Zhang, Kun Sun, Liangxiao Jiang","doi":"10.1109/ICME55011.2023.00165","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00165","url":null,"abstract":"Multimodal learning aims to integrate complementary information from different modalities for more reliable decisions. However, existing multimodal classification methods simply integrate the learned local features, which ignore the underlying structure of each modality and the higher-order correlation across modalities. In this paper, we propose a novel Hierarchical Attention Learning Network (HALNet) for multimodal classification. Specifically, HALNet has three merits: 1) A hierarchical feature fusion module is proposed to learn multilevel features, aggregating multi-level features for a global feature representation with the attention mechanism and progressive fusion tactics. 2) A cross-modal higher-order fusion module is introduced to capture the prospective cross-modal correlations at label space. 3) A dual prediction pattern is designed to generate credible decisions. Extensive experiments on three real-world multimodal datasets demonstrate that HALNet achieves competitive performance compared to the state-of-the-art.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125022544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"End-To-End Part-Level Action Parsing With Transformer","authors":"Xiaojia Chen, Xuanhan Wang, Beitao Chen, Lianli Gao","doi":"10.1109/ICME55011.2023.00135","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00135","url":null,"abstract":"The divide-and-conquer strategy, which interprets part-level action parsing as a detect-then-parsing pipeline, has been widely used and become a general tool for part-level action understanding. However, existing methods that derive from the strategy usually suffer from either strong dependence on prior detection or high computational complexity. In this paper, we present the first fully end-to-end part-level action parsing framework with transformers, termed PATR. Unlike existing methods, our method regards part-level action parsing as a hierarchical set prediction problem and unifies person detection, body part detection, and action state recognition into one model. In PATR, predefined learnable representations, including general instance representations and general part representations, are guided to adaptively attend to the image features that are relevant to target body parts. Then, conditioning on corresponding learnable representations, attended image features are hierarchically decoded into corresponding semantics (i.e., person location, body part location, and action states for each body part). In this way, PATR relies on characteristics of body parts, instead of prior predictions like bounding boxes, to parse action states, thus removing the strong dependence between sub-tasks and eliminating the computational burdens caused by the multi-stage paradigm. Extensive experiments conducted on challenging Kinetic-TPS indicate that our method achieves very competitive results. In particular, our model outperforms all state-of-the-art part-level action parsing approaches by a margin, reaching around 3.8±2.0% Accp higher than previous methods. These findings indicate the potential of PATR to serve as a new baseline for part-level action parsing methods in the future. Our code and models are publicly available. 1","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125321253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengfei Hu, Y. Tao, Qiqi Bao, Guijin Wang, Wenming Yang
{"title":"EvenFace: Deep Face Recognition with Uniform Distribution of Identities","authors":"Pengfei Hu, Y. Tao, Qiqi Bao, Guijin Wang, Wenming Yang","doi":"10.1109/ICME55011.2023.00298","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00298","url":null,"abstract":"The development of loss functions over the past few years has brought great success to face recognition. Most algorithms focus on improving the intra-class compactness of face features but ignore the inter-class separability. In this paper, we propose a method named EvenFace, which introduces a regularization variance item and a mean term of inter-class separability to further promote the even distribution of class centers on the hypersphere, thereby increasing the inter-class distance. In order to evaluate the inter-class separability, a new index is proposed to better reflect the distribution of class centers and guide the classification. By penalizing the angle between each identity and its surrounding neighbors, the resulting uniform distribution of identities enables full exploitation of the feature space, leading to discriminative face representations. Our proposed loss function can effectively boost the performance of softmax loss variants. Quantitative comparisons with other state-of-the-art methods on several benchmarks demonstrate the superiority of EvenFace.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122864674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Xu, Shengxin Wang, Thomas Lukasiewicz, Zhenghua Xu
{"title":"Adaptive-Masking Policy with Deep Reinforcement Learning for Self-Supervised Medical Image Segmentation","authors":"Gang Xu, Shengxin Wang, Thomas Lukasiewicz, Zhenghua Xu","doi":"10.1109/ICME55011.2023.00390","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00390","url":null,"abstract":"Although self-supervised learning methods based on masked image modeling have achieved some success in improving the performance of deep learning models, these methods have difficulty in ensuring that the masked region is the most appropriate for each image, resulting in segmentation networks that do not get the best weights in pre-training. Therefore, we propose a new adaptive-masking policy self-supervised learning method. Specifically, we model the process of masking images as a reinforcement learning problem and use the results of the reconstruction model as a feedback signal to guide the agent to learn the masking policy to select a more appropriate mask position and size for each image, helping the reconstruction network to learn more fine-grained image representation information and thus improve the downstream segmentation model performance. We conduct extensive experiments on two datasets, Cardiac and TCIA, and the results show that our approach outperforms current state-of-the-art self-supervised learning methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131602502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trajectory Alignment based Multi-Scaled Temporal Attention for Efficient Video Transformer","authors":"Zao Zhang, Dong Yuan, Yu Zhang, Wei Bao","doi":"10.1109/ICME55011.2023.00244","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00244","url":null,"abstract":"Although the video transformer gets remarkable accuracy on video recognition tasks, it is hard to be deployed in resource-constrained scenarios due to the high computational cost. A method that dynamically modifies and trains the transformer model, ensuring that the computational cost matches the deployment scenario requirement, would be an effective solution to this challenge. In this paper, we propose a method for modifying large-scale video transformers with trajectory alignment based multi-scaled temporal attention (TAMS) schemes to reduce the computational cost significantly while losing accuracy slightly. In the temporal dimension, we adopt multi-scaled sparsity patterns in hierarchical transformer blocks. In the spatial dimension, we use region selection to force the transformer to focus on high-importance regions while not corrupting the spatial context. Our method reduces up to 40% computational cost of state-of-the-art large-scale video transformers with a slight accuracy drop (~ 7%) on the video recognition task.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Snapshot Compressive Imaging via Optical Flow","authors":"Zan Chen, Ran Li, Yongqiang Li, Yuanjing Feng","doi":"10.1109/ICME55011.2023.00372","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00372","url":null,"abstract":"Video Snapshot compressive imaging (SCI) reconstruction recovers video frames from a compressed 2D measurement. However, frames at each time cannot be observed since the limitation of hardware. To make SCI suitable for more applications, we propose an optical flow-based deep unfolding network for video SCI reconstruction. To extract the optical flow, the feature maps during the iterative process are transformed by the convolution layer into the estimated optical flow. We designed a motion regularizer, which uses voxels of iterative frames and optical flow to update the reconstructed frames. The proposed motion regularizer efficiently captures the temporal correlation between the previous and next frames, which contributes to reconstructing the observed and unobserved frames from input measurement in a SCI reconstruction process. Experiments show that our method achieves state-of-the-art results on PSNR and SSIM.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132934285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dianlong You, P. Wang, Y. Zhang, Ling Wang, Shunfu Jin
{"title":"Few-Shot Object Detection via Back Propagation and Dynamic Learning","authors":"Dianlong You, P. Wang, Y. Zhang, Ling Wang, Shunfu Jin","doi":"10.1109/ICME55011.2023.00493","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00493","url":null,"abstract":"Utilizing traditional object detectors to build a few-shot object detection (FSOD) model ignores the differences between classification and regression tasks and causes task conflict and class confusion, resulting in a decline in classification performance. In contrast, this paper focuses on the above shortcomings and utilizes the strategies of Back Propagation and Dynamic Learning to construct a model for addressing FSOD, named BPDL. Our BPDL has a two-fold main idea: a) it uses the optimized localization boxes to alleviate the task conflict and refine classification features by a correction loss, and b) it develops a dynamic learning strategy to filter the confusing features and mine more realistic prototype representations of the categories to calibrate classification. Extensive experiments on multiple benchmarks show that our BPDL model outperforms existing methods and advances the FSOD task’s state-of-the-art.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"31 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133205381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Attention Prediction Correction with Channel Suppression for Weakly-Supervised Semantic Segmentation","authors":"Guoying Sun, Meng Yang","doi":"10.1109/ICME55011.2023.00150","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00150","url":null,"abstract":"Single-stage weakly-supervised semantic segmentation (WSSS) with image-level labels has become a new research hotspot in the community for its lower cost and higher training efficiency. However, the pseudo label of WSSS generally suffers from somewhat noise, which limits the segmentation performance. In this paper, to explore the integral foreground activation, we propose the Channel Suppression (CS) module for preventing only activating the most discriminative regions, thereby improving the initial pseudo labels. To rectify the in-correct prediction, we explore the Self-Attention Prediction Correction (SAPC) module, which adaptively generates the category-wise prediction rectification weights. After extensive experiments, the proposed efficient single-stage framework achieves excellent performance with 67.6% mIoU and 39.9% mIoU on PASCAL VOC 2012 and MS COCO 2014 datasets, significantly exceeding several recent single-stage methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133272667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}