2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献

筛选
英文 中文
Synthetic Feature Assessment for Zero-Shot Object Detection 零射击目标检测的综合特征评估
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00083
Xinmiao Dai, Chong Wang, Haohe Li, Sunqi Lin, Lining Dong, Jiafei Wu, Jun Wang
{"title":"Synthetic Feature Assessment for Zero-Shot Object Detection","authors":"Xinmiao Dai, Chong Wang, Haohe Li, Sunqi Lin, Lining Dong, Jiafei Wu, Jun Wang","doi":"10.1109/ICME55011.2023.00083","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00083","url":null,"abstract":"Zero-shot object detection aims to simultaneously identify and localize classes that were not presented during training. Many generative model-based methods have shown promising performance by synthesizing the visual features of unseen classes from semantic embeddings. However, these synthetic features are inevitably of varied quality, which may be far from the ground truth. It degrades the performance of trained unseen classifier. Instead of tweaking the generative model, a new idea of feature quality assessment is proposed to utilize both the good and bad features to optimize the classifier in the right direction. Moreover, contrastive learning is also introduced to enhance the feature uniqueness between unseen and seen classes, which helps the feature assessment implicitly. To demonstrate the effectiveness of the proposed algorithm, comprehensive experiments are conducted on the MS COCO dataset and PASCAL VOC dataset, the state-of-the-art performance is achieved. Our code is available at: https://github.com/Dai1029/SFA-ZSD.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131723592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-View Co-Learning Method for Multimodal Sentiment Analysis 面向多模态情感分析的多视图共同学习方法
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00238
Wenxiu Geng, Yulong Bian, Xiangxian Li
{"title":"A Multi-View Co-Learning Method for Multimodal Sentiment Analysis","authors":"Wenxiu Geng, Yulong Bian, Xiangxian Li","doi":"10.1109/ICME55011.2023.00238","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00238","url":null,"abstract":"Existing works on multimodal sentiment analysis have focused on learning more discriminative unimodal sentiment information or improving multimodal fusion methods to enhance modal complementarity. However, practical results of these methods have been limited owing to the problems of insufficient intra-modal representation and inter-modal noise. To alleviate this problem, we propose a multi-view co-learning method (MVATF) for video sentiment analysis. First, we propose a multi-view features extraction module to capture more perspectives from a single modality. Second, we propose a two-level fusion sentiment enhancement strategy that uses hierarchical attentive learning fusion and a multi-task learning fusion module to achieve co-learning to effectively filter inter-modal noise for better multimodal sentiment fusion features. Experimental results on the CH-SIMS, CMU-MOSI and MOSEI datasets show that the proposed method outperforms the state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131758782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain-Invariant Feature Learning for General Face Forgery Detection 通用人脸伪造检测的域不变特征学习
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00396
Jian Zhang, J. Ni
{"title":"Domain-Invariant Feature Learning for General Face Forgery Detection","authors":"Jian Zhang, J. Ni","doi":"10.1109/ICME55011.2023.00396","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00396","url":null,"abstract":"Though existing methods for face forgery detection achieve fairly good performance under the intra-dataset scenario, few of them gain satisfying results in the case of cross-dataset testing with more practical value. To tackle this issue, in this paper, we propose a novel domain-invariant feature learning framework - DIFL for face forgery detection. In the framework, an adversarial domain generalization is introduced to learn the domain-invariant features from the forged samples synthesized by various algorithms. Then a center loss in fractional form (CL) is utilized to learn more discriminative features by aggregating the real faces while separating the fake faces from the real ones in the embedding space. In addition, a global and local random crop augmentation strategy is utilized to generate more data views of forged facial images at various scales. Extensive experimental results demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130882029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fixing Domain Bias for Generalized Deepfake Detection 修正广义深度假检测的域偏置
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00380
Yuzhe Mao, Weike You, Linna Zhou, Zhigao Lu
{"title":"Fixing Domain Bias for Generalized Deepfake Detection","authors":"Yuzhe Mao, Weike You, Linna Zhou, Zhigao Lu","doi":"10.1109/ICME55011.2023.00380","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00380","url":null,"abstract":"Generalizing deepfake detection has posed a great challenge to digital media forensics, as inferior performance is obtained when training sets and testing sets are domain-mismatched. In this paper, we show that a CNN-based detection model can significantly improve performance by fixing domain bias. Specifically, we propose a novel Fixing Domain Bias network (FDBN). FDBN does not rely on manual features, but is based on three core designs. Firstly, a domain-invariant network based on randomly stylized normalization is devised to constrain the domain discrepancy in the feature space. Then, through adversarial learning, a generalizing representation in the stylized distribution is learned to enhance the shared feature bias among manipulation methods in the domain-specific network. Finally, to encourage equality of biases among different domains, we utilize the bias extrapolation penalty strategy by suppressing the expected bias on the extremely-performing domains. Extensive experiments demonstrate that our framework achieves effectiveness and generalization towards unseen face forgeries.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131010763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is Really Correlation Information Represented Well in Self-Attention for Skeleton-based Action Recognition? 相关性信息在基于骨架的动作识别中的自我注意表现得好吗?
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00139
Wentian Xin, Hongkai Lin, Ruyi Liu, Yi Liu, Q. Miao
{"title":"Is Really Correlation Information Represented Well in Self-Attention for Skeleton-based Action Recognition?","authors":"Wentian Xin, Hongkai Lin, Ruyi Liu, Yi Liu, Q. Miao","doi":"10.1109/ICME55011.2023.00139","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00139","url":null,"abstract":"Transformer has shown significant advantages by various vision tasks. However, the lack of representation of correlation information about data properties makes it difficult to match the excellent results consistent with GCNs in skeleton-based action recognition. In this paper, we propose a Topology and Frames-guided Spatial-Temporal ConvFormer Network (TF-STCFormer), which is well suited for dynamically extracting topological and inter-frame uniqueness & co-occurrence information. Three essential components make up the proposed framework: (1) Grouped Physical-guided Spatial Transformer for focusing on learning essential spatial features and physical topology. (2) Global and Focal Temporal Transformer for promoting the relationship of different joints in consecutive frames and improving the representation of discriminative key-frames. (3) Grouped Dilation Temporal Convolution for connecting the intermediate output obtained by the previous transformers in the feature channels of different dilation. Experiments on four standard datasets (NTU RGB+D, NTU RGB+D 120, NW-UCLA, and UAV-Human) demonstrate that our approach prominently outperforms state-of-the-art methods on all benchmarks.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"120 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133686514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Peer Upsampled Transform Domain Prediction for G-PCC G-PCC的对等上采样变换域预测
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00127
Wenyi Wang, Yingzhan Xu, Kai Zhang, Li Zhang
{"title":"Peer Upsampled Transform Domain Prediction for G-PCC","authors":"Wenyi Wang, Yingzhan Xu, Kai Zhang, Li Zhang","doi":"10.1109/ICME55011.2023.00127","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00127","url":null,"abstract":"To meet the growing demand for point cloud compression, MPEG is developing a point cloud compression standard called as G-PCC. In G-PCC, upsampled transform domain prediction (UTDP) is used to improve attribute coding performance. However, only the attributes in the previous level can be used to predict the attributes of transform sub-blocks in UTDP, which limits the efficiency of UTDP. To address this limitation, we propose a method called peer-UTDP to improve UTDP by using peer neighbors in this paper. With peer-UTDP, attributes of co-plane or co-line peer neighbors in the level same as that of the transform sub-block can be used as prediction in the upsampling process. Experimental results show that our method outperforms G-PCC with an average coding gain of -5.1%, -5.4%, -5.1% and -1.4% under C1 condition, and -5.1%, -5.6%, -5.6% and -1.7% under C2 condition for Y, Cb, Cr and reflectance, respectively. The proposed peer-UTDP has been adopted by G-PCC.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132726039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Microimage-based Two-step Search For Plenoptic 2.0 Video Coding 基于微图像的两步搜索pleenoptic 2.0视频编码
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00437
Yuqing Yang, Xin Jin, Kedeng Tong, Chen Wang, Haitian Huang
{"title":"Microimage-based Two-step Search For Plenoptic 2.0 Video Coding","authors":"Yuqing Yang, Xin Jin, Kedeng Tong, Chen Wang, Haitian Huang","doi":"10.1109/ICME55011.2023.00437","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00437","url":null,"abstract":"The plenoptic 2.0 video can record a time-varying dense light field, which benefits many immersive visual applications such as AR/VR. However, traditional inter motion estimation methods perform inefficiently in such kinds of video sequences due to the distinctive temporal characteristics caused by the imaging principle. In this paper, a microimage-based two- step search (MTSS) is proposed to achieve a better trade-off between coding performance and coding complexity. Based on microimage focus variation analysis in imaging dynamic scenes, a microlens-diameter and matching-distance spatial search with local refinement is proposed to exploit the image correlations among the microimage and to compensate the defocused inaccuracy. Implementing the proposed motion estimation in H.266 platform VTM-11.0 and comparing with the state-of-the-art methods, obvious compression efficiency improvements are achieved with limited complexity increment, which benefits the standardization of plenoptic video coding.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131444200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Content-based Viewport Prediction Framework for 360° Video Using Personalized Federated Learning and Fusion Techniques 使用个性化联邦学习和融合技术的360°视频基于内容的视口预测框架
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00118
Mehdi Setayesh, V. Wong
{"title":"A Content-based Viewport Prediction Framework for 360° Video Using Personalized Federated Learning and Fusion Techniques","authors":"Mehdi Setayesh, V. Wong","doi":"10.1109/ICME55011.2023.00118","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00118","url":null,"abstract":"Viewport prediction is a key enabler for 360° video streaming over wireless networks. To improve the prediction accuracy, a common approach is to use a content-based viewport prediction model. Saliency detection based on traditional convolutional neural networks (CNNs) suffers from distortion due to equirectangular projection. Also, the viewers may have their own viewing behavior and are not willing to share their historical head movement with others. To address the aforementioned issues, in this paper, we first develop a saliency detection model using a spherical CNN (SPCNN). Then, we train the viewers’ head movement prediction model using personalized federated learning (PFL). Finally, we propose a content-based viewport prediction framework by integrating the video saliency map and the head orientation map of each viewer using fusion techniques. The experimental results show that our proposed framework provides higher average accuracy and precision when compared with three state-of-the-art algorithms from the literature.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127858809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Level Feature-Guided Stereoscopic Video Quality Assessment Based on Transformer and Convolutional Neural Network 基于变压器和卷积神经网络的多层次特征引导立体视频质量评估
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00428
Yuan Chen, Sumei Li
{"title":"Multi-Level Feature-Guided Stereoscopic Video Quality Assessment Based on Transformer and Convolutional Neural Network","authors":"Yuan Chen, Sumei Li","doi":"10.1109/ICME55011.2023.00428","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00428","url":null,"abstract":"Stereoscopic video (3D video) has been increasingly applied in industry and entertainment. And the research of stereoscopic video quality assessment (SVQA) has become very important for promoting the development of stereoscopic video system. Many CNN-based models have emerged for SVQA task. However, these methods ignore the significance of the global information of the video frames for quality perception. In this paper, we propose a multi-level feature-fusion model based on Transformer and convolutional neural network (MFFTCNet) to assess the perceptual quality of the stereoscopic video. Firstly, we use global information from Transformer to guide local information from convolutional neural network (CNN). Moreover, we utilize low-level features in the CNN branch to guide high-level features. Besides, considering the binocular rivalry effect in the human vision system (HVS), we use 3D convolution to achieve rivalry fusion of binocular features. The proposed method is tested on two public stereoscopic video quality datasets. The result shows that this method correlates highly with human visual perception and outperforms state-of-the-art (SOTA) methods by a significant margin.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127521844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hidden Follower Detection via Refined Gaze and Walking State Estimation 基于改进凝视和行走状态估计的隐藏追随者检测
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00356
Yaxi Chen, Ruimin Hu, Danni Xu, Zheng Wang, Linbo Luo, Dengshi Li
{"title":"Hidden Follower Detection via Refined Gaze and Walking State Estimation","authors":"Yaxi Chen, Ruimin Hu, Danni Xu, Zheng Wang, Linbo Luo, Dengshi Li","doi":"10.1109/ICME55011.2023.00356","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00356","url":null,"abstract":"Hidden following is following behavior with special intentions, and detecting hidden following behavior can prevent many criminal activities in advance. The previous method uses gaze and spacing behaviors to distinguish hidden followers from normal pedestrians. However, they express gaze behaviors in a coarse-grained way with binary values, making it difficult to accurately depict the gaze state of pedestrians. To this end, we propose the Refined Hidden Follower Detection (RHFD) model by choosing a suitable mapping function based on the principle that the closer the gaze direction is to someone, the more likely it is to gaze at someone, which converts the gaze direction into a continuous estimated gaze state representing the complex and variable gaze behavior of pedestrians. Simultaneously, we introduce variations in the magnitude and direction of pedestrian velocity to refine the representation of pedestrian walking states. Experimental results on the surveillance dataset show that RHFD outperforms state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124115706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信