2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献

筛选
英文 中文
Pose-Motion Video Anomaly Detection via Memory-Augmented Reconstruction and Conditional Variational Prediction 基于记忆增强重建和条件变分预测的姿态运动视频异常检测
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00464
Weilin Wan, Weizhong Zhang, Cheng Jin
{"title":"Pose-Motion Video Anomaly Detection via Memory-Augmented Reconstruction and Conditional Variational Prediction","authors":"Weilin Wan, Weizhong Zhang, Cheng Jin","doi":"10.1109/ICME55011.2023.00464","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00464","url":null,"abstract":"Video anomaly detection (VAD) is a challenging computer vision problem. Due to the scarcity of anomalous events in training, the models learned by existing methods would mistakenly fit the ubiquitous non-causal or even spurious correlations, leading to failure in inference. In this paper, we propose a new two-phase Pose-Motion Video Anomaly Detection (PoMo) approach by jointly exploiting the informative features including the poses and optical flows that have rich causal correlations with abnormality. PoMo can effectively prevent the non-causal features from leaking in by either encoding only the essential information, i.e., the poses and optical flows, with our normalized autoencoder (phase one), or separately modeling the knowledge learned in phase one using our causal-conditioned autoencoder (phase two). The difference between normal and abnormal events can be amplified through these two phases. Thus the generalization ability can be reinforced. Extensive experimental results demonstrate the superiority of our approach over the existing methods and the improvements in AUC-ROC can be up to 1.5%.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126749302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sea Surface Object Detection Based on Background Dynamic Perception and Cross-Layer Semantic Interaction 基于背景动态感知和跨层语义交互的海面目标检测
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00021
Songbin Li, Xiangzhi Yang, Jingang Wang
{"title":"Sea Surface Object Detection Based on Background Dynamic Perception and Cross-Layer Semantic Interaction","authors":"Songbin Li, Xiangzhi Yang, Jingang Wang","doi":"10.1109/ICME55011.2023.00021","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00021","url":null,"abstract":"Sea surface object detection plays an important role in the coastal defense monitoring system. Existing target detection methods mostly lack the adaptive perception of background changes. In addition, these methods fail to further integrate and interact with the multi-layer features extracted from the deep backbone network. To address these two issues, we first propose a Background Dynamic Perception module, which uses environmental information as an auxiliary. We train the detector to dynamically capture the background changes through a multi-task learning framework. Moreover, we propose a Cross-Layer Semantic Interaction module, which can achieve cross-layer interaction and reduce information loss. Based on the above modules, we propose a sea surface object detection network. To verify the performance, we collected real sea surface data and built a sea surface object dataset. Experimental results demonstrate that our method achieves 74.4% AP on the dataset, outperforming the latest methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126856836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transferable Waveform-level Adversarial Attack against Speech Anti-spoofing Models 针对语音反欺骗模型的可转移波形级对抗性攻击
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00395
Bingyuan Huang, Sanshuai Cui, Xiangui Kang, Enping Li
{"title":"Transferable Waveform-level Adversarial Attack against Speech Anti-spoofing Models","authors":"Bingyuan Huang, Sanshuai Cui, Xiangui Kang, Enping Li","doi":"10.1109/ICME55011.2023.00395","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00395","url":null,"abstract":"Speech anti-spoofing models protect media from malicious fake speech but are vulnerable to adversarial attacks. Studies of adversarial attacks are conducive to developing robust speech anti-spoofing systems. Existing transfer-based attack methods mainly craft adversarial speech examples at the handcrafted-feature level, which have limited attack ability against the real-world anti-spoofing systems, as these systems only have raw waveform input interfaces. In this work, we propose a waveform-level input data transformation, called the temporal smoothing method, to generate more transferable adversarial speech examples. In the optimization iterations of the adversarial perturbation, we randomly smooth input waveforms to prevent the adversarial examples from overfitting white-box surrogate models. The proposed transformation can be combined with any iterative gradient-based attack method. Extensive experiments demonstrate that our method significantly enhances the transferability of waveform-level adversarial speech examples.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116520430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Learning for Personalized Image Aesthetics Assessment 个性化图像美学评价的联邦学习
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00065
Zhiwei Xiong, Han Yu, Zhiqi Shen
{"title":"Federated Learning for Personalized Image Aesthetics Assessment","authors":"Zhiwei Xiong, Han Yu, Zhiqi Shen","doi":"10.1109/ICME55011.2023.00065","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00065","url":null,"abstract":"Image aesthetics assessment (IAA) evaluates the generic aesthetic quality of images. Due to the subjectivity of IAA, personalized IAA (PIAA) is essential to offering dedicated image retrieval, editing, and recommendation services to individual users. However, existing PIAA approaches are trained under the centralized machine learning paradigm, which exposes sensitive image and rating data. To enhance PIAA in a privacy-preserving manner, we propose the first-of-its-kind Federated Learning-empowered Personalized Image Aesthetics Assessment (FedPIAA) approach with a simple yet effective model structure to capture image aesthetic patterns and personalized user aesthetic preferences. Extensive experimental comparison against eight baselines using the real-world dataset FLICKER-AES demonstrates that FedPIAA outperforms FedAvg by 1.56% under the small support set and by 4.86% under the large support set in terms of Spearman rank-order correlation coefficient between predicted and ground-truth personalized aesthetics scores, while achieving comparable performance with the best non-FL centralized PIAA approaches.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122513084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MBDFNet: Multi-scale Bidirectional Dynamic Feature Fusion Network for Efficient Image Deblurring MBDFNet:基于多尺度双向动态特征融合网络的高效图像去模糊
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/icme55011.2023.00096
Zhongbao Yang, Jin-shan Pan
{"title":"MBDFNet: Multi-scale Bidirectional Dynamic Feature Fusion Network for Efficient Image Deblurring","authors":"Zhongbao Yang, Jin-shan Pan","doi":"10.1109/icme55011.2023.00096","DOIUrl":"https://doi.org/10.1109/icme55011.2023.00096","url":null,"abstract":"Existing deep image deblurring models achieve favorable results with growing model complexity. However, these models cannot be applied to those low-power devices with resource constraints (e.g., smart phones) as these models usually have lots of network parameters and require computational costs. To overcome this problem, we develop a multi-scale bidirectional dynamic feature fusion network (MBDFNet), a lightweight deep deblurring model, for efficient image deblurring. The proposed MBDFNet progressively restores multi-scale latent clear images from blurry input based on a multi-scale framework. To better utilize the features from coarse scales, we propose a bidirectional gated dynamic fusion module so that the most useful information of the features from coarse scales are kept to facilitate the estimations in the finer scales. We solve the proposed MBDFNet in an end-to-end manner and show that it has fewer network parameters and lower FLOPs values, where the FLOPs value of the proposed MBDFNet is at least 6× smaller than the state-of-the-art methods. Both quantitative and qualitative evaluations show that the proposed MBDFNet achieves favorable performance in terms of model complexity while having competitive performance in terms of accuracy against state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130487970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ELAN: Enhancing Temporal Action Detection with Location Awareness ELAN:通过位置感知增强时间动作检测
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00179
Guo Chen, Yin-Dong Zheng, Zhe Chen, Jiahao Wang, Tong Lu
{"title":"ELAN: Enhancing Temporal Action Detection with Location Awareness","authors":"Guo Chen, Yin-Dong Zheng, Zhe Chen, Jiahao Wang, Tong Lu","doi":"10.1109/ICME55011.2023.00179","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00179","url":null,"abstract":"Current query-based temporal action detection methods lack multiple levels of location awareness, leading to performance degradation. In this paper, we present a novel query-based method called Enhanced Location-Aware Network (ELAN) for temporal action detection. ELAN adopts a lightweight convolution-based encoder, termed Temporal Location-Aware (TLA) encoder, to model temporal continuous location-aware context. Moreover, ELAN can re-aware the location-related context inside and between queries through our proposed Instance Location-Aware (ILA) decoder. As a result, ELAN can learn strong position discrimination of actions and effectively eliminates the ambiguity caused by sparse action decoding, yielding significant improvement in detection performance. ELAN achieves state-of-the-art performance on two temporal action detection benchmarks, including THUMOS-14 and ActivityNet-1.3.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129036688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simple Stochastic Neural Network for Improving Adversarial Robustness 一种提高对抗鲁棒性的简单随机神经网络
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00392
Hao Yang, Min Wang, Zhengfei Yu, Yun Zhou
{"title":"A Simple Stochastic Neural Network for Improving Adversarial Robustness","authors":"Hao Yang, Min Wang, Zhengfei Yu, Yun Zhou","doi":"10.1109/ICME55011.2023.00392","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00392","url":null,"abstract":"The vulnerability of deep learning algorithms to malicious attack has garnered significant attention from researchers in recent years. In order to provide more reliable services for safety-sensitive applications, prior studies have introduced Stochastic Neural Networks (SNNs) as a means of improving adversarial robustness. However, existing SNNs are not designed from the perspective of optimizing the adversarial decision boundary and rely on complex and expensive adversarial training. To find an appropriate decision boundary, we propose a simple and effective stochastic neural network that incorporates a regularization term into the objective function. Our approach maximizes the variance of the feature distribution in low-dimensional space and forces the feature direction to align with the eigenvectors of the covariance matrix. Due to no need of adversarial training, our method requires lower computational cost and does not sacrifice accuracy on normal examples, making it suitable for use with a variety of models. Extensive experiments against various well-known white- and black-box attacks show that our proposed method outperforms state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123836119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSAANet: Multi-scale Axial Attention Network for medical image segmentation MSAANet:医学图像分割的多尺度轴向关注网络
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00391
Hao Zeng, Xinxin Shan, Yu Feng, Ying Wen
{"title":"MSAANet: Multi-scale Axial Attention Network for medical image segmentation","authors":"Hao Zeng, Xinxin Shan, Yu Feng, Ying Wen","doi":"10.1109/ICME55011.2023.00391","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00391","url":null,"abstract":"U-Net and its variants have achieved impressive results in medical image segmentation. However, the downsampling operation of such U-shaped networks causes the feature maps to lose a certain degree of spatial information, and most existing methods use convolution and transformer sequentially, it is hard to extract more comprehensive feature representation of the image. In this paper, we propose a novel U-shaped segmentation network named Multi-scale Axial Attention Network (MSAANet) to solve the above problems. Specifically, we propose a cross-scale interactive attention: multi-scale axial attention (MSAA), which achieves direction-perception attention of different scales interaction. So that the downsampling deep features and the shallow features can maintain context spatial consistency. Besides, we propose a Convolution-Transformer (CT) block, which makes transformer and convolution complement each other to enhance comprehensive feature representation. We evaluate the proposed method on the public datasets Synapse and ACDC. Experimental results demonstrate that MSAANet effectively improves segmentation accuracy.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123851867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-slice Context Consistency for Semi-supervised 3D Left Atrium Segmentation 半监督三维左心房分割的横切面上下文一致性
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00400
Yongchao Wang, Bin Xiao, Xiuli Bi, Weisheng Li, Xinbo Gao
{"title":"Cross-slice Context Consistency for Semi-supervised 3D Left Atrium Segmentation","authors":"Yongchao Wang, Bin Xiao, Xiuli Bi, Weisheng Li, Xinbo Gao","doi":"10.1109/ICME55011.2023.00400","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00400","url":null,"abstract":"Semi-supervised learning is a promising approach in reducing the requirement to collect large amounts of dense annotations, especially in medical image segmentation. However, most existing semi-supervised 3D medical image segmentation methods tend to ignore the cross-slice context that contains extensive structural information. We believe cross-slice context can help the model capture semantic information complementary to slice context and achieve robust and more accurate segmentation. Therefore, in this paper, we propose a novel cross-slice context consistency framework for 3D left atrium segmentation named CSC2-Net. Our method can effectively utilize unlabeled data by encouraging consistent results between slice segmentation and cross-slice inference segmentation. To achieve this, we design a bidirectional gated context inference module (Bi-GCM) to model cross-slice context and predict slice segmentation without direct slice features. Experiments on a public left atrium (LA) databases show that our method achieves higher performance and outperforms state-of-the-art methods by imposing cross-slice consistency constraint.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123969013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFAE: Masked frame-level autoencoder with hybrid-supervision for low-resource music transcription 蒙面帧级自动编码器与混合监督低资源音乐转录
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00194
Yulun Wu, Jiahao Zhao, Yi Yu, Wei Li
{"title":"MFAE: Masked frame-level autoencoder with hybrid-supervision for low-resource music transcription","authors":"Yulun Wu, Jiahao Zhao, Yi Yu, Wei Li","doi":"10.1109/ICME55011.2023.00194","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00194","url":null,"abstract":"Automantic Music Transcription (AMT) is an essential topic in music information retrieval (MIR), and it aims to transcribe audio recordings into symbolic representations. Recently, large-scale piano datasets with high-quality notations have been proposed for high-resolution piano transcription, which resulted in domain-specific AMT models achieved state-of- the-art results. However, those methods are hardly generalized to other ’low-resource’ instruments (such as guitar, cello, clarinet, etc.) transcription. In this paper, we propose a hybrid-supervised framework, the masked frame-level autoencoder (MFAE), to solve this issue. The proposed MFAE reconstructs the frame-level features of low-resource data to understand generic representations of low-resource instruments and improves low-resource transcription performance. Experimental results on several low- resource datasets (MAPS, MusicNet, and Guitarset) show that our framework achieves state-of-the-art performance in note-wise scores (Note F1 83.4%64.1%86.7%, Note-with-offset F1 59.8%41.4%71.6%). Moreover, our framework can be well generalized to various genres of instrument transcription, both in data-plentiful and data-limited scenarios.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121173436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信