2023 IEEE International Conference on Multimedia and Expo (ICME)最新文献

筛选
英文 中文
Weight-based Regularization for Improving Robustness in Image Classification 基于权重的正则化方法提高图像分类的鲁棒性
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00305
Hao Yang, Min Wang, Zhengfei Yu, Yun Zhou
{"title":"Weight-based Regularization for Improving Robustness in Image Classification","authors":"Hao Yang, Min Wang, Zhengfei Yu, Yun Zhou","doi":"10.1109/ICME55011.2023.00305","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00305","url":null,"abstract":"Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks. Recently, Stochastic Neural Networks (SNNs) have been proposed to enhance adversarial robustness by injecting uncertainty into the models. However, existing SNNs often inspired by intuition and rely on adversarial training, which is computationally costly. To address this issue, we propose a novel SNN called the Weight-based Stochastic Neural Network (WB-SNN), which is based on optimizing an error upper bound of adversarial robustness from the perspective of weight distribution. To the best of our knowledge, we are the first to propose a theoretically guaranteed weight-based stochastic neural network without relying on adversarial training. In comparison to normal adversarial training, our method saves about three times the computation cost. Extensive experiments on various datasets, networks, and adversarial attacks have demonstrated the effectiveness of the proposed method.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130429013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Layer Modeling for Complex Document Layout Generation 复杂文档布局生成的图像图层建模
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00386
Tianlong Ma, Xingjiao Wu, Xiangcheng Du, Yanlong Wang, Cheng Jin
{"title":"Image Layer Modeling for Complex Document Layout Generation","authors":"Tianlong Ma, Xingjiao Wu, Xiangcheng Du, Yanlong Wang, Cheng Jin","doi":"10.1109/ICME55011.2023.00386","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00386","url":null,"abstract":"Document layout analysis (DLA) plays an essential role in information extraction and document understanding. At present, DLA has reached the milestone achievement; however, DLA of non-Manhattan is still challenging because of annotation data limitations. In this paper, we propose an image layer modeling method to mitigate this issue. The image layer modeling method generates document images of non-Manhattan layouts by superimposing images under pre-defined aesthetic rules. Due to the lack of evaluation benchmark for non-Manhattan layout, we have constructed a manually-labeled non-Manhattan layout fine-grained segmentation dataset. To the best of our knowledge, this is the first manually-labeled non-Manhattan layout fine-grained segmentation dataset. Extensive experimental results verify that our proposed image layer modeling method can better deal with the fine-grained segmented document of the non-Manhattan layout.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128843289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSG-CAM:Multi-scale inputs make a better visual interpretation of CNN networks MSG-CAM:多尺度输入可以更好地对CNN网络进行视觉解读
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00061
Xiaohong Xiang, Fuyuan Zhang, Xin Deng, Ke Hu
{"title":"MSG-CAM:Multi-scale inputs make a better visual interpretation of CNN networks","authors":"Xiaohong Xiang, Fuyuan Zhang, Xin Deng, Ke Hu","doi":"10.1109/ICME55011.2023.00061","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00061","url":null,"abstract":"The visualization of deep learning models has been widely studied as an effective means of exploring the decision-making processes within these models. However, current visualization methods suffer from several limitations, such as low resolution and poor visualization of multiple occurrences of the same class. In this paper, we propose a novel visualization technique called MSG-CAM, which is an improvement on the existing Group-CAM method. Our method uses the feature maps and gradients of the last layer of the convolutional neural network to create masks through multi-scale enlargement of the original input image and fusion of the resulting feature maps and gradients. Through both qualitative and quantitative analysis, we have demonstrated that the saliency maps generated by our method are more reasonable and accurately reflect the internal decision-making processes of the neural network.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125468642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio-Visual Generalized Zero-Shot Learning Based on Variational Information Bottleneck 基于变分信息瓶颈的视听广义零学习
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00084
Yapeng Li, Yong Luo, Bo Du
{"title":"Audio-Visual Generalized Zero-Shot Learning Based on Variational Information Bottleneck","authors":"Yapeng Li, Yong Luo, Bo Du","doi":"10.1109/ICME55011.2023.00084","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00084","url":null,"abstract":"Audio-visual generalized zero-shot learning (GZSL) aims to train a model on seen classes for classifying data samples from both seen classes and unseen classes. Due to the absence of unseen training samples, the model tends to misclassify unseen class samples into seen classes. To mitigate this problem, in this paper, we propose a method based on variational information bottleneck for audio-visual GZSL. Specifically, we model the joint representations as a product-of-experts over marginal representations to integrate the information of audio and visual. Besides, we introduce variational information bottleneck to the learning of audio-visual joint representations and marginal representations of audio, visual, and text label modalities. This helps our model reduce the negative impact of information that cannot be generalized to unseen classes. Experimental results conducted on the UCF-GZSL, VGGSound-GZSL, and ActivityNet-GZSL benchmarks demonstrate the effectiveness and superiority of the proposed model for audio-visual GZSL.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125502994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Region-Aware Semantic Consistency for Unsupervised Domain-Adaptive Semantic Segmentation 无监督域自适应语义分割的区域感知语义一致性
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00024
Jun Xie, Yixuan Zhou, Xing Xu, Guoqing Wang, Fumin Shen, Yang Yang
{"title":"Region-Aware Semantic Consistency for Unsupervised Domain-Adaptive Semantic Segmentation","authors":"Jun Xie, Yixuan Zhou, Xing Xu, Guoqing Wang, Fumin Shen, Yang Yang","doi":"10.1109/ICME55011.2023.00024","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00024","url":null,"abstract":"As acquiring pixel-wise labels for semantic segmentation is labor-intensive, unsupervised domain adaptation (UDA) techniques aim to transfer knowledge from synthetic data to real-scene data. To overcome the distribution misalignment between the source domain and the target domain, Teacher-Student (TS) methods are widely-used and promising. In TS methods, the student resorts to the one-hot pseudo labels generated by the teacher. However, the generated one-hot pseudo labels are dubious and ignore the semantic correlation among classes. Besides, in the same position of the same image, the output distributions between the student and the teacher should be consistent. Such prediction consistency is defined as Region-Aware Semantic Consistency (RASC). Correspondingly, we propose an RASC module to assimilate the output distributions of the teacher and the student. Our RASC module is flexible and easily plugged into TS state-of-the-arts (SOTAs) based on either CNNs or Transformers.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126853258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-stream Adaptive Offloading of Joint Compressed Video Streams, Feature Streams, and Semantic Streams in Edge Computing Systems 边缘计算系统中联合压缩视频流、特征流和语义流的多流自适应卸载
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00175
Dieli Hu, Wen Ji, Zhi Wang
{"title":"Multi-stream Adaptive Offloading of Joint Compressed Video Streams, Feature Streams, and Semantic Streams in Edge Computing Systems","authors":"Dieli Hu, Wen Ji, Zhi Wang","doi":"10.1109/ICME55011.2023.00175","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00175","url":null,"abstract":"Edge computing (EC) is a promising paradigm for serving latency-sensitive video applications. However, massive compressed video transmission and analysis require considerable bandwidth and computing resources, posing enormous challenges for current multimedia frameworks. Novel multi-stream frameworks that incorporate feature streams are more practical. The reason is that feature streams containing compact video frame feature data have a lower bitrate and better serve machine vision tasks. Nevertheless, feature extraction by devices increases the latency and energy consumption of local computing. Therefore, how to offload suitable streams according to video task requirements and system resources is a challenging issue. This paper studies EC-based multi-stream adaptive offloading. We model the multi-stream offloading and computation problem to maximize system utility by jointly optimizing offloading decisions, computation resource allocation, and video frame sampling rates. Frame sampling rates, processing latency, and energy consumption are considered in system utility modeling. The formulated optimization problem is a mixed-integer programming (MIP) problem. We propose an efficient algorithm to address this MIP problem. The proposed algorithm relies on the Hungarian algorithm and improved greedy Markov approximation. The simulation results validate our proposed algorithm’s superior performance.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126899192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative Spatial-Temporal Distillation for Efficient Video Deraining 高效视频训练的协同时空精馏
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00332
Yuzhang Hu, Minghao Liu, Wenhan Yang, Jiaying Liu, Zongming Guo
{"title":"Collaborative Spatial-Temporal Distillation for Efficient Video Deraining","authors":"Yuzhang Hu, Minghao Liu, Wenhan Yang, Jiaying Liu, Zongming Guo","doi":"10.1109/ICME55011.2023.00332","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00332","url":null,"abstract":"In this paper, we propose a novel knowledge distillation framework to improve the efficiency of deep networks for video deraining. The knowledge is transferred from a large-scale powerful teacher network to a compact efficient student network via the proposed collaborative spatial-temporal distillation framework. The framework is equipped with three collaboration schemes of different granularities that make use of spatial-temporal redundancy in a complementary way for better distillation performance. First, the spatial alignment module applies distillation constraints at different spatial scales to achieve better scale invariance in transferred knowledge. Second, the temporal alignment module traces both temporal status between teacher and student separately and collaboratively, to comprehensively utilize inter-frame information. Third, these two alignment modules interact through a spatial-temporal adaptor, where spatial-temporal knowledge is transferred in a unified framework. Extensive experiments demonstrate the superiority of our distillation framework as well as the effectiveness of each module. Our code is available at: https://github.com/HuYuzhang/Knowledge-Distillation.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126255030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DFCP: Few-Shot DeepFake Detection via Contrastive Pretraining DFCP:通过对比预训练的少镜头深度假检测
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00393
Bojing Zou, Chao Yang, Jiazhi Guan, Chengbin Quan, Youjian Zhao
{"title":"DFCP: Few-Shot DeepFake Detection via Contrastive Pretraining","authors":"Bojing Zou, Chao Yang, Jiazhi Guan, Chengbin Quan, Youjian Zhao","doi":"10.1109/ICME55011.2023.00393","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00393","url":null,"abstract":"Abuses of forgery techniques have created a considerable problem of misinformation on social media. Although scholars devote many efforts to face forgery detection (a.k.a DeepFake detection) and achieve some results, two issues still hinder the practical application. 1) Most detectors do not generalize well to unseen datasets. 2) In a supervised manner, most previous works require a considerable amount of manually labeled data. To address these problems, we propose a simple contrastive pertaining framework for DeepFake detection (DFCP), which works in a finetuning-after-pretraining manner, and requires only a few labels (5%). Specifically, we design a two-stream framework to simultaneously learn high-frequency texture features and high-level semantics information during pretraining. In addition, a video-based frame sampling strategy is proposed to mitigate potential noise data in the instance-discriminative contrastive learning to achieve better performance. Experimental results on several downstream datasets show the state-of-the-art performance of the proposed DFCP, which works at frame-level (w/o temporal reasoning) with high efficiency but outperforms video-level methods.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123016843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
E2: Entropy Discrimination and Energy Optimization for Source-free Universal Domain Adaptation 2 .无源通用域自适应的熵判别和能量优化
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00460
Meng Shen, A. J. Ma, PongChi Yuen
{"title":"E2: Entropy Discrimination and Energy Optimization for Source-free Universal Domain Adaptation","authors":"Meng Shen, A. J. Ma, PongChi Yuen","doi":"10.1109/ICME55011.2023.00460","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00460","url":null,"abstract":"Universal domain adaptation (UniDA) transfers knowledge under both distribution and category shifts. Most UniDA methods accessible to source-domain data during model adaptation may result in privacy policy violation and source-data transfer inefficiency. To address this issue, we propose a novel source-free UniDA method coupling confidence-guided entropy discrimination and likelihood-induced energy optimization. The entropy-based separation of target-known and unknown classes is too conservative for known-class prediction. Thus, we derive the confidence-guided entropy by scaling the normalized prediction score with the known-class confidence, that more known-class samples are correctly predicted. Due to difficult estimation of the marginal distribution without source-domain data, we constrain the target-domain marginal distribution by maximizing (minimizing) the known (unknown)-class likelihood, which equals free energy optimization. Theoretically, the overall optimization amounts to decreasing and increasing internal energy of known and unknown classes in physics, respectively. Extensive experiments demonstrate the superiority of the proposed method.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"10 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120921294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ABTD-Net: Autonomous Baggage Threat Detection Networks for X-ray Images ABTD-Net:用于x射线图像的自主行李威胁检测网络
2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI: 10.1109/ICME55011.2023.00214
Wen Liu, Degang Sun, Yan Wang, Zhongyuan Chen, Xinbo Han, Haitian Yang
{"title":"ABTD-Net: Autonomous Baggage Threat Detection Networks for X-ray Images","authors":"Wen Liu, Degang Sun, Yan Wang, Zhongyuan Chen, Xinbo Han, Haitian Yang","doi":"10.1109/ICME55011.2023.00214","DOIUrl":"https://doi.org/10.1109/ICME55011.2023.00214","url":null,"abstract":"Automated security screening has a significant role In protecting public spaces from security threats by employing X-ray images to detect prohibited items. However, there are challenges of noise production due to squeezing, occlusion, and penetration of luggage objects. Additionally, the hues of objects are monotonous and lack luster. To solve these problems, we propose an Autonomous Baggage Threat Detection Network (ABTD-Net) for accurate prohibited item detection. To tackle the difficulty of capturing distinctive visual features, we constructed a Feature Adjustment Head (FAH) to refine pyramid features. Specifically, we designed an Attention Module (AM) at several places after initially using a Dense Unidirectional Propagation (DUP) to filter noise. Furthermore, we created a Feature Fusion Head (FFH) that dynamically fuses hierarchical visual information under object occlusion, including early-fusion and late-fusion. Extensive experiments on security inspection X-ray datasets OPIXray and HiXray demonstrate the superiority of our proposed method.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121120454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信