2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献_第6页

Joint Statistical and Causal Feature Modulated Face Anti-Spoofing 联合统计与因果特征调制人脸抗欺骗

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00210

Xin Dong, Tao Wang, Zhendong Li, Hao Liu

{"title":"Joint Statistical and Causal Feature Modulated Face Anti-Spoofing","authors":"Xin Dong, Tao Wang, Zhendong Li, Hao Liu","doi":"10.1109/ICME55011.2023.00210","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00210","url":null,"abstract":"In this paper, we propose a hierarchical feature modulation (HFM) approach for stable face anti-spoofing in unseen domains and unseen attacks. The conventional multi-domain based generalizable approaches likely lead to local optima due to the complicated or heuristic learning paradigm. Inspired by the fact that high-level semantic disturbances and low-level miscellaneous bias jointly cause the distribution shift, HFM aims to modulate the fine-grained feature in a hierarchical manner. Specifically, we complement the structural feature with patch-wise learnable statistical information, i.e. local difference histogram, to relieve the overfitting on high-level semantics. We further introduce the structural causal model (SCM) with imaging color model to reveal that presenting mediums and capturing devices destroy the liveness-relevant information from the low level. Thus we model this hidden entanglement as a distribution mixture problem and propose the expectation-maximization (EM) based causal intervention to remove these miscellanies. Experimental results on public datasets demonstrate the effectiveness of HFM, especially in out-of-distribution settings.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129809218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Face Poison: Obstructing DeepFakes by Disrupting Face Detection 脸毒:通过干扰人脸检测来阻碍深度造假

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00213

Yuezun Li, Jiaran Zhou, Siwei Lyu

引用次数: 0

LC-Beating: An Online System for Beat and Downbeat Tracking using Latency-Controlled Mechanism lc -Beat:一种使用延迟控制机制的在线拍、重拍跟踪系统

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00192

Xinlu Liu, Jiale Qian, Qiqi He, Yi Yu, Wei Li

引用次数: 0

Inter-Intra Camera Identity Learning for Person Re-Identification with Training in Single Camera 基于单摄像机训练的人再识别的摄像机内识别学习

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00414

Guoqing Zhang, Zhiyuan Luo, Weisi Lin, Xuan Jing

{"title":"Inter-Intra Camera Identity Learning for Person Re-Identification with Training in Single Camera","authors":"Guoqing Zhang, Zhiyuan Luo, Weisi Lin, Xuan Jing","doi":"10.1109/ICME55011.2023.00414","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00414","url":null,"abstract":"Traditional person re-identification (re-ID) methods generally rely on inter-camera person images to smooth the domain disparities between cameras. However, collecting and annotating a large number of inter-camera identities is extremely difficult and time-consuming, and this makes it hard to deploy person re-ID systems in new locations. To tackle this challenge, this paper studies the single-camera-training (SCT) setting where every person in the training set only appears in one camera. In this work, we design a novel inter-intra camera identity learning (I2CIL) framework to effectively address the SCT person re-ID. Specifically, (i) we design a Dual-Branch Identity Learning (DBIL) network consisting of inter-camera and intra-camera learning branches to learn person ID discriminative information. The former learns camera-irrelevant feature representations by constraining the distance of inter-camera negative sample pairs closer than the distance of intra-camera negative sample pairs. The latter focuses on pulling the distance of intra-camera positive sample pairs closer and pushing the distance of intra-camera negative sample pairs further, partially alleviating weak ID discrimination caused by the lack of inter-camera annotations. (ii) We design a Mixed-Sampling Joint Learning (MSJL) strategy, which is capable to capture inter- and intra-camera samples and independently accomplish the inter- and intra-camera learning tasks at the same time, avoiding the mutual interference between the two tasks. Extensive experiments on two public SCT datasets prove the superiority of the proposed approach.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129083436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual Episodic Sampling and Momentum Consistency Regularization for Unsupervised Few-shot Learning 无监督少镜头学习的双情景抽样和动量一致性正则化

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00491

Jiaxin Chen, Yanxu Hu, Meng Shen, A. J. Ma

引用次数: 0

Latent Feature Regularization based Adversarial Network for Brain Tumor Anomaly Detection 基于潜在特征正则化的对抗网络脑肿瘤异常检测

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00168

Nan Wang, Chengwei Chen, Lizhuang Ma, Shaohui Lin

引用次数: 0

Cross-domain Prototype Contrastive loss for Few-shot 2D Image-Based 3D Model Retrieval 基于少镜头二维图像的三维模型检索的跨域原型对比损失

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00492

Yaqian Zhou, Yu Liu, Dan Song, Jiayu Li, Xuanya Li, Anjin Liu

引用次数: 0

Privacy-Protected Facial Expression Recognition Augmented by High-Resolution Facial Images 高分辨率面部图像增强的隐私保护面部表情识别

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00236

Cong Liang, Shangfei Wang, Xiaoping Chen

{"title":"Privacy-Protected Facial Expression Recognition Augmented by High-Resolution Facial Images","authors":"Cong Liang, Shangfei Wang, Xiaoping Chen","doi":"10.1109/ICME55011.2023.00236","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00236","url":null,"abstract":"Cloud-based expression recognition from high-resolution facial images may put the subjects’ privacy at risk. We identify two kinds of privacy leakage, the appearance leakage in which the visual appearances of subjects are disclosed and the identity-pattern leakage in which the identity information of subjects is dug out. To address both leakages, we propose privacy-protected facial expression recognition from low-resolution facial images with the help of high-resolution facial images. Specifically, to prevent appearance leakage, we propose to extract identity-invariant representations from downsampled images, from which the visually distinguishable appearances cannot be recovered. To prevent identity-pattern leakage, we propose to eliminate the identity information from the extracted representations by leveraging the disentangled representations of high-resolution images as privileged information. After training, our method can fully capture identity-invariant representations from downsampled images for expression recognition without the requirement of high-resolution samples. These privacy-protected representations can be safely transmitted through the Internet. Experimental results in different scenarios demonstrate that the proposed method protects privacy without significantly inhibiting facial expression recognition.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130570953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding and Improving Perceptual Quality of Volumetric Video Streaming 理解和改进体积视频流的感知质量

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00339

Mengyu Yang, Di Wu, Zelong Wang, Miao Hu, Yipeng Zhou

{"title":"Understanding and Improving Perceptual Quality of Volumetric Video Streaming","authors":"Mengyu Yang, Di Wu, Zelong Wang, Miao Hu, Yipeng Zhou","doi":"10.1109/ICME55011.2023.00339","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00339","url":null,"abstract":"Volumetric video is fully three-dimensional and provides users with highly immersive and interactive experience. However, it is difficult to stream volumetric video over the Internet due to sheer video size and limited network bandwidth. Existing solutions suffered from poor perceptual quality and low coding efficiency. In this paper, we first conduct a comprehensive user study to understand the effectiveness of popular perceptual quality metrics for volumetric video. It is observed that those metrics cannot well capture the impact of user viewing behaviors. Considering the findings that users are more sensitive to the distortion of 2D image rendered from 3D point cloud, a new metric called Volu-FMAF is proposed to better represent perceptual quality of volumetric video. Next, we propose a novel neural-based volumetric video streaming framework RenderVolu and design a distortion-aware rendered image super-resolution network, called RenDA-Net, to further improve user perceptual quality. Last, we conduct extensive experiments with real datasets to validate our proposed method, and the results show that our method can boost the perceptual quality of volumetric video by 171% to 190%, and achieves a speedup of 108x in terms of decoding efficiency compared to the state-of-the-art approaches.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130578953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CHAN: Cross-Modal Hybrid Attention Network for Temporal Language Grounding in Videos 视频中时间语言基础的跨模态混合注意网络

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00259

Wen Wang, Ling Zhong, Guang Gao, Minhong Wan, J. Gu

{"title":"CHAN: Cross-Modal Hybrid Attention Network for Temporal Language Grounding in Videos","authors":"Wen Wang, Ling Zhong, Guang Gao, Minhong Wan, J. Gu","doi":"10.1109/ICME55011.2023.00259","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00259","url":null,"abstract":"The goal of temporal language grounding (TLG) task is to temporally localize the most semantically matched video segment with respect to a given sentence query in an untrimmed video. How to effectively incorporate the cross-modal interactions between video and language is the key to improve grounding performance. Previous approaches focus on learning correlations by computing the attention matrix between each frame-word pair, while ignoring the global semantics conditioned on one modality for better associating the complex video contents and sentence query of the target modality. In this paper, we propose a novel Cross-modal Hybrid Attention Network, which integrates two parallel attention fusion modules to exploit the semantics of each modality and interactions in cross modalities. One is Intra-Modal Attention Fusion, which utilizes gated self-attention to capture the frame-by-frame and word-by-word relations conditioned on the other modality. The other is Inter-Modal Attention Fusion, which utilizes query and key features derived from different modalities to calculate the co-attention weights and further promote inter-modal fusion. Experimental results show that our CHAN significantly outperforms several existing state-of-the-arts on three challenging datasets (ActivityNet Captions, Charades-STA and TACOS), demonstrating the effectiveness of our proposed method.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132053760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0