2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献_第3页

Cross-Level Guided Attention for Human-Object Interaction Detection 人-物交互检测的跨层次引导注意

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00055

Zongxu Yue, Ge Li, Wei Gao

引用次数: 0

Decomposed Key-Point Detector for Swimming Pool Localization 泳池定位的分解关键点检测器

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00049

Choongseop Lee, Timothy Woinoski, I. Bajić

引用次数: 0

Copyright Page 版权页

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/icmew59549.2023.00003

引用次数: 0

$text{VVC} +mathrm{M}$: Plug and Play Scalable Image Coding for Humans and Machines $text{VVC} + mathm {M}$:即插即用的可扩展图像编码

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00041

Alon Harell, Yalda Foroutan, I. Bajić

引用次数: 0

Predicting Car Accidents with YOLOv7 Object Detection and Object Relationships 基于YOLOv7对象检测和对象关系的车祸预测

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00021

Ming-Xuan Wu, Chia-Sheng Chang, J. Miao, Chia-Yen Lee

引用次数: 0

Rate-Controllable and Target-Dependent JPEG-Based Image Compression Using Feature Modulation 基于特征调制的速率可控和目标相关的jpeg图像压缩

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00035

Seongmoon Jeong, K. Jeon, J. Ko

{"title":"Rate-Controllable and Target-Dependent JPEG-Based Image Compression Using Feature Modulation","authors":"Seongmoon Jeong, K. Jeon, J. Ko","doi":"10.1109/ICMEW59549.2023.00035","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00035","url":null,"abstract":"While conventional image compression techniques are optimized for human visual perception, the rise of machine learning techniques has led to the emergence of image compression methods tailored for machine vision tasks. Although a few recent studies explored target-dependent reconfiguration of lightweight codecs such as JPEG, these approaches are limited to specific trained bitrates only. Moreover, existing deep learning-based compression frameworks entail a high computational cost, making them impractical for real-time compression on devices with limited resources. In this paper, we present a novel JPEG compression framework that can adaptively generate an optimal quantization table (QT) depending on both the target bitrate and the target metric (quality or accuracy). To provide fine controllability over a wide range of bitrates, we employ a feature modulation technique to a QT generator and bitrate predictor, which are trained by a novel training method called bitrate range partitioning. Our simulation results show that the proposed framework enhances the performance of standard JPEG by up to 2dB in PSNR and 10% in accuracy at the same bitrate, while incurring minimal computational overhead compared to JPEG.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116783803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Crossing of the Dream Fantasy: AI Technique Application for Visualizing a Fictional Character's Dream 梦境的穿越:人工智能技术在虚拟人物梦境可视化中的应用

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00064

Jiayang Huang, Yiran Chen, D. Yip

引用次数: 0

Non-Reference Subjective Evaluation Method for Binaural Audio in 6-DOF VR Applications 六自由度虚拟现实中双耳音频的非参考主观评价方法

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00079

Zhiyu Li, Jing Wang, Hanqi Zhang, S. Hasan, Jingxin Li

引用次数: 0

Stabilizing the Convolution Operations for Neural Network-Based Image and Video Codecs for Machines 基于神经网络的机器图像和视频编解码器的稳定卷积运算

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00036

Honglei Zhang, N. Le, Francesco Cricri, J. Ahonen, H. R. Tavakoli

{"title":"Stabilizing the Convolution Operations for Neural Network-Based Image and Video Codecs for Machines","authors":"Honglei Zhang, N. Le, Francesco Cricri, J. Ahonen, H. R. Tavakoli","doi":"10.1109/ICMEW59549.2023.00036","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00036","url":null,"abstract":"Deep convolutional neural networks are generally trained in the floating-point number format. However, the convolution operation in the floating-point domain suffers from numerically unstable behavior due to the limitation of the precision and range of the number format. For deep convolutional neural network-based image/video codec, the instability may cause corrupted reconstructions when the decoder works in a different computing environment. This paper proposes a post-training quantization technique where the convolution operations are performed in the integer domain while other operations are in the floating-point domain. We derived the optimal scaling factors and bits allocation strategy for the input tensor and kernel weights. With the derived scaling factors, the codec can use the significant bits of the single-precision floating-point number for the convolution operations, which does not require the system to support integer operations. Experiments on a learned image codec on machine consumption show that the proposed method achieves the similar performance as the floating-point version while achieving stable behavior on different platforms.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124726381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature-Guided Machine-Centric Image Coding for Downstream Tasks 面向下游任务的特征引导机器中心图像编码

2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00037

Sangwoon Kwak, J. Yun, Hyon‐Gon Choo, Munchurl Kim

{"title":"Feature-Guided Machine-Centric Image Coding for Downstream Tasks","authors":"Sangwoon Kwak, J. Yun, Hyon‐Gon Choo, Munchurl Kim","doi":"10.1109/ICMEW59549.2023.00037","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00037","url":null,"abstract":"Video coding, a process of compressing and decompressing digital video content, has traditionally been optimized for human visual systems by reducing its size while maintaining the human perceptual quality. However, with the remarkable progress of artificial intelligence (AI) technology, the need for machine-centric coding has rapidly been increasing in recent years. In response to these trends, international standardization organizations such as MPEG are actively working to develop and launch new standards on coding technologies for machines, called video coding for machines (VCM). In this paper, we present a novel feature-guided block-wise image blending method for image compression, which is suitable for machine applications such as object detection and segmentation. For this, we use a gradient map of the feature loss using the pretrained encoder part of a task-specific network as guide for input degradation, so that the degraded input images can be effectively compressed for machine-centric tasks. Our method is simple but effective because additional training is not required by utilizing the pretrained encoder parts of networks for targeted tasks. Experimental results show that BD-rate gains can be obtained by applying our proposed method with averages 11% and 8% for object detection and instance segmentation tasks, respectively, compared to the image anchor results of MPEG-VCM reference software v0.4.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126673969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0