2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献_第2页

Pose-Motion Video Anomaly Detection via Memory-Augmented Reconstruction and Conditional Variational Prediction 基于记忆增强重建和条件变分预测的姿态运动视频异常检测

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00464

Weilin Wan, Weizhong Zhang, Cheng Jin

{"title":"Pose-Motion Video Anomaly Detection via Memory-Augmented Reconstruction and Conditional Variational Prediction","authors":"Weilin Wan, Weizhong Zhang, Cheng Jin","doi":"10.1109/ICME55011.2023.00464","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00464","url":null,"abstract":"Video anomaly detection (VAD) is a challenging computer vision problem. Due to the scarcity of anomalous events in training, the models learned by existing methods would mistakenly fit the ubiquitous non-causal or even spurious correlations, leading to failure in inference. In this paper, we propose a new two-phase Pose-Motion Video Anomaly Detection (PoMo) approach by jointly exploiting the informative features including the poses and optical flows that have rich causal correlations with abnormality. PoMo can effectively prevent the non-causal features from leaking in by either encoding only the essential information, i.e., the poses and optical flows, with our normalized autoencoder (phase one), or separately modeling the knowledge learned in phase one using our causal-conditioned autoencoder (phase two). The difference between normal and abnormal events can be amplified through these two phases. Thus the generalization ability can be reinforced. Extensive experimental results demonstrate the superiority of our approach over the existing methods and the improvements in AUC-ROC can be up to 1.5%.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126749302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sea Surface Object Detection Based on Background Dynamic Perception and Cross-Layer Semantic Interaction 基于背景动态感知和跨层语义交互的海面目标检测

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00021

Songbin Li, Xiangzhi Yang, Jingang Wang

引用次数: 0

Transferable Waveform-level Adversarial Attack against Speech Anti-spoofing Models 针对语音反欺骗模型的可转移波形级对抗性攻击

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00395

Bingyuan Huang, Sanshuai Cui, Xiangui Kang, Enping Li

引用次数: 0

Federated Learning for Personalized Image Aesthetics Assessment 个性化图像美学评价的联邦学习

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00065

Zhiwei Xiong, Han Yu, Zhiqi Shen

引用次数: 1

MBDFNet: Multi-scale Bidirectional Dynamic Feature Fusion Network for Efficient Image Deblurring MBDFNet:基于多尺度双向动态特征融合网络的高效图像去模糊

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/icme55011.2023.00096

Zhongbao Yang, Jin-shan Pan

{"title":"MBDFNet: Multi-scale Bidirectional Dynamic Feature Fusion Network for Efficient Image Deblurring","authors":"Zhongbao Yang, Jin-shan Pan","doi":"10.1109/icme55011.2023.00096","DOIUrl":"https://doi.org/10.1109/icme55011.2023.00096","url":null,"abstract":"Existing deep image deblurring models achieve favorable results with growing model complexity. However, these models cannot be applied to those low-power devices with resource constraints (e.g., smart phones) as these models usually have lots of network parameters and require computational costs. To overcome this problem, we develop a multi-scale bidirectional dynamic feature fusion network (MBDFNet), a lightweight deep deblurring model, for efficient image deblurring. The proposed MBDFNet progressively restores multi-scale latent clear images from blurry input based on a multi-scale framework. To better utilize the features from coarse scales, we propose a bidirectional gated dynamic fusion module so that the most useful information of the features from coarse scales are kept to facilitate the estimations in the finer scales. We solve the proposed MBDFNet in an end-to-end manner and show that it has fewer network parameters and lower FLOPs values, where the FLOPs value of the proposed MBDFNet is at least 6× smaller than the state-of-the-art methods. Both quantitative and qualitative evaluations show that the proposed MBDFNet achieves favorable performance in terms of model complexity while having competitive performance in terms of accuracy against state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130487970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ELAN: Enhancing Temporal Action Detection with Location Awareness ELAN:通过位置感知增强时间动作检测

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00179

Guo Chen, Yin-Dong Zheng, Zhe Chen, Jiahao Wang, Tong Lu

引用次数: 0

A Simple Stochastic Neural Network for Improving Adversarial Robustness 一种提高对抗鲁棒性的简单随机神经网络

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00392

Hao Yang, Min Wang, Zhengfei Yu, Yun Zhou

{"title":"A Simple Stochastic Neural Network for Improving Adversarial Robustness","authors":"Hao Yang, Min Wang, Zhengfei Yu, Yun Zhou","doi":"10.1109/ICME55011.2023.00392","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00392","url":null,"abstract":"The vulnerability of deep learning algorithms to malicious attack has garnered significant attention from researchers in recent years. In order to provide more reliable services for safety-sensitive applications, prior studies have introduced Stochastic Neural Networks (SNNs) as a means of improving adversarial robustness. However, existing SNNs are not designed from the perspective of optimizing the adversarial decision boundary and rely on complex and expensive adversarial training. To find an appropriate decision boundary, we propose a simple and effective stochastic neural network that incorporates a regularization term into the objective function. Our approach maximizes the variance of the feature distribution in low-dimensional space and forces the feature direction to align with the eigenvectors of the covariance matrix. Due to no need of adversarial training, our method requires lower computational cost and does not sacrifice accuracy on normal examples, making it suitable for use with a variety of models. Extensive experiments against various well-known white- and black-box attacks show that our proposed method outperforms state-of-the-art methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123836119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MSAANet: Multi-scale Axial Attention Network for medical image segmentation MSAANet:医学图像分割的多尺度轴向关注网络

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00391

Hao Zeng, Xinxin Shan, Yu Feng, Ying Wen

{"title":"MSAANet: Multi-scale Axial Attention Network for medical image segmentation","authors":"Hao Zeng, Xinxin Shan, Yu Feng, Ying Wen","doi":"10.1109/ICME55011.2023.00391","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00391","url":null,"abstract":"U-Net and its variants have achieved impressive results in medical image segmentation. However, the downsampling operation of such U-shaped networks causes the feature maps to lose a certain degree of spatial information, and most existing methods use convolution and transformer sequentially, it is hard to extract more comprehensive feature representation of the image. In this paper, we propose a novel U-shaped segmentation network named Multi-scale Axial Attention Network (MSAANet) to solve the above problems. Specifically, we propose a cross-scale interactive attention: multi-scale axial attention (MSAA), which achieves direction-perception attention of different scales interaction. So that the downsampling deep features and the shallow features can maintain context spatial consistency. Besides, we propose a Convolution-Transformer (CT) block, which makes transformer and convolution complement each other to enhance comprehensive feature representation. We evaluate the proposed method on the public datasets Synapse and ACDC. Experimental results demonstrate that MSAANet effectively improves segmentation accuracy.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123851867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-slice Context Consistency for Semi-supervised 3D Left Atrium Segmentation 半监督三维左心房分割的横切面上下文一致性

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00400

Yongchao Wang, Bin Xiao, Xiuli Bi, Weisheng Li, Xinbo Gao

{"title":"Cross-slice Context Consistency for Semi-supervised 3D Left Atrium Segmentation","authors":"Yongchao Wang, Bin Xiao, Xiuli Bi, Weisheng Li, Xinbo Gao","doi":"10.1109/ICME55011.2023.00400","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00400","url":null,"abstract":"Semi-supervised learning is a promising approach in reducing the requirement to collect large amounts of dense annotations, especially in medical image segmentation. However, most existing semi-supervised 3D medical image segmentation methods tend to ignore the cross-slice context that contains extensive structural information. We believe cross-slice context can help the model capture semantic information complementary to slice context and achieve robust and more accurate segmentation. Therefore, in this paper, we propose a novel cross-slice context consistency framework for 3D left atrium segmentation named CSC2-Net. Our method can effectively utilize unlabeled data by encouraging consistent results between slice segmentation and cross-slice inference segmentation. To achieve this, we design a bidirectional gated context inference module (Bi-GCM) to model cross-slice context and predict slice segmentation without direct slice features. Experiments on a public left atrium (LA) databases show that our method achieves higher performance and outperforms state-of-the-art methods by imposing cross-slice consistency constraint.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123969013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MFAE: Masked frame-level autoencoder with hybrid-supervision for low-resource music transcription 蒙面帧级自动编码器与混合监督低资源音乐转录

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00194

Yulun Wu, Jiahao Zhao, Yi Yu, Wei Li

{"title":"MFAE: Masked frame-level autoencoder with hybrid-supervision for low-resource music transcription","authors":"Yulun Wu, Jiahao Zhao, Yi Yu, Wei Li","doi":"10.1109/ICME55011.2023.00194","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00194","url":null,"abstract":"Automantic Music Transcription (AMT) is an essential topic in music information retrieval (MIR), and it aims to transcribe audio recordings into symbolic representations. Recently, large-scale piano datasets with high-quality notations have been proposed for high-resolution piano transcription, which resulted in domain-specific AMT models achieved state-of- the-art results. However, those methods are hardly generalized to other ’low-resource’ instruments (such as guitar, cello, clarinet, etc.) transcription. In this paper, we propose a hybrid-supervised framework, the masked frame-level autoencoder (MFAE), to solve this issue. The proposed MFAE reconstructs the frame-level features of low-resource data to understand generic representations of low-resource instruments and improves low-resource transcription performance. Experimental results on several low- resource datasets (MAPS, MusicNet, and Guitarset) show that our framework achieves state-of-the-art performance in note-wise scores (Note F1 83.4%64.1%86.7%, Note-with-offset F1 59.8%41.4%71.6%). Moreover, our framework can be well generalized to various genres of instrument transcription, both in data-plentiful and data-limited scenarios.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121173436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0