2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献

筛选
英文 中文
Cross-Level Guided Attention for Human-Object Interaction Detection 人-物交互检测的跨层次引导注意
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00055
Zongxu Yue, Ge Li, Wei Gao
{"title":"Cross-Level Guided Attention for Human-Object Interaction Detection","authors":"Zongxu Yue, Ge Li, Wei Gao","doi":"10.1109/ICMEW59549.2023.00055","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00055","url":null,"abstract":"Recently, the transformer-based methods have achieved advanced performance result in human-object interaction (HOI) detection task. However, most of them directly utilize the semantically high-level feature from the deep layer's output in pre-trained backbone to get the final HOI detection results, which we consider may prevent the further performance improvement due to the semantic gap between the upstream pre-train task and HOI detection task. In this work, we design a Cross-Level Guided Attention Network (CLAN) for HOI detection. The proposed method utilizes the information from the pre-training task's semantically high-level feature to generate the attention score towards the low-level and primitive feature to get the key signal for HOI detection task. Experiments shows that CLAN can achieve competitive performance results on both V-COCO and HICO-DET benchmarks.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122446447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decomposed Key-Point Detector for Swimming Pool Localization 泳池定位的分解关键点检测器
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00049
Choongseop Lee, Timothy Woinoski, I. Bajić
{"title":"Decomposed Key-Point Detector for Swimming Pool Localization","authors":"Choongseop Lee, Timothy Woinoski, I. Bajić","doi":"10.1109/ICMEW59549.2023.00049","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00049","url":null,"abstract":"Pool localization is an essential prerequisite for swimmer analysis or performance measurement. Automated analysis studies on swimming pools have often been proposed due to the increase in broadcast videos and the development of machine learning techniques, but there have been few studies dealing with GPU memory usage and latency in terms of practical analysis. This work proposes an efficient swimming pool key-point detection method utilizing a U-Net-based decomposed detector, for robust detection regardless of camera parameters. Experiments show the proposed detector has higher accuracy than the original model, while reducing latency and memory usage by 4.80 and 2.91 times, respectively.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124167804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copyright Page 版权页
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/icmew59549.2023.00003
{"title":"Copyright Page","authors":"","doi":"10.1109/icmew59549.2023.00003","DOIUrl":"https://doi.org/10.1109/icmew59549.2023.00003","url":null,"abstract":"","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131343960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
$text{VVC} +mathrm{M}$: Plug and Play Scalable Image Coding for Humans and Machines $text{VVC} + mathm {M}$:即插即用的可扩展图像编码
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00041
Alon Harell, Yalda Foroutan, I. Bajić
{"title":"$text{VVC} +mathrm{M}$: Plug and Play Scalable Image Coding for Humans and Machines","authors":"Alon Harell, Yalda Foroutan, I. Bajić","doi":"10.1109/ICMEW59549.2023.00041","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00041","url":null,"abstract":"Compression for machines is an emerging field, where inputs are encoded while optimizing the performance of down-stream automated analysis. In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. Often performed by jointly optimizing the compression scheme for both machine task and human perception, this results in sub-optimal rate-distortion (RD) performance for the machine side. We focus on the case of images, proposing to utilize the pre-existing residual coding capabilities of video codecs such as VVC to create a scalable codec from any image compression for machines (ICM) scheme. Using our approach we improve an existing scalable codec to achieve superior RD performance on the machine task, while remaining competitive for human perception. Moreover, our approach can be trained post-hoc for any given ICM scheme, and without creating a coupling between the quality of the machine analysis and human vision.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"35 4-5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131632968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Car Accidents with YOLOv7 Object Detection and Object Relationships 基于YOLOv7对象检测和对象关系的车祸预测
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00021
Ming-Xuan Wu, Chia-Sheng Chang, J. Miao, Chia-Yen Lee
{"title":"Predicting Car Accidents with YOLOv7 Object Detection and Object Relationships","authors":"Ming-Xuan Wu, Chia-Sheng Chang, J. Miao, Chia-Yen Lee","doi":"10.1109/ICMEW59549.2023.00021","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00021","url":null,"abstract":"In this paper, we propose a method for predicting potential accidents using object detection and object relationship analysis. We use YOLOv7 object detector to identify various objects on roads and highways, and analyze the relationships between them to predict potential accidents. Our model is trained on a large dataset from Kaggle competitions, which includes driving record videos of different types of vehicles such as buses, cars, trailers, trucks, and lorries. We analyze the patterns of detected accident vehicle objects to determine whether an accident occurred in consecutive frames. Experimental results show that our method can reasonably predict whether an accident will occur in the next 20 frames. The use of object detection allows for the identification of multiple objects, thus improving the accuracy of accident prediction. Although we did not achieve better performance or accuracy, this approach has the potential to improve the safety performance of autonomous vehicles and reduce the occurrence of traffic accidents.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133509818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rate-Controllable and Target-Dependent JPEG-Based Image Compression Using Feature Modulation 基于特征调制的速率可控和目标相关的jpeg图像压缩
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00035
Seongmoon Jeong, K. Jeon, J. Ko
{"title":"Rate-Controllable and Target-Dependent JPEG-Based Image Compression Using Feature Modulation","authors":"Seongmoon Jeong, K. Jeon, J. Ko","doi":"10.1109/ICMEW59549.2023.00035","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00035","url":null,"abstract":"While conventional image compression techniques are optimized for human visual perception, the rise of machine learning techniques has led to the emergence of image compression methods tailored for machine vision tasks. Although a few recent studies explored target-dependent reconfiguration of lightweight codecs such as JPEG, these approaches are limited to specific trained bitrates only. Moreover, existing deep learning-based compression frameworks entail a high computational cost, making them impractical for real-time compression on devices with limited resources. In this paper, we present a novel JPEG compression framework that can adaptively generate an optimal quantization table (QT) depending on both the target bitrate and the target metric (quality or accuracy). To provide fine controllability over a wide range of bitrates, we employ a feature modulation technique to a QT generator and bitrate predictor, which are trained by a novel training method called bitrate range partitioning. Our simulation results show that the proposed framework enhances the performance of standard JPEG by up to 2dB in PSNR and 10% in accuracy at the same bitrate, while incurring minimal computational overhead compared to JPEG.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116783803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crossing of the Dream Fantasy: AI Technique Application for Visualizing a Fictional Character's Dream 梦境的穿越:人工智能技术在虚拟人物梦境可视化中的应用
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00064
Jiayang Huang, Yiran Chen, D. Yip
{"title":"Crossing of the Dream Fantasy: AI Technique Application for Visualizing a Fictional Character's Dream","authors":"Jiayang Huang, Yiran Chen, D. Yip","doi":"10.1109/ICMEW59549.2023.00064","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00064","url":null,"abstract":"This research explores the creative potential of artificial intelligence (AI) technology in artistic practice by testing various AI tools for their usability and capability as mediums of artistic expression. The project focuses on visualizing a dream of Mulan, a classic Chinese female figure, using the predictable features of AI generative models, with the aim of exploring whether such methods can produce amazing results. The project employs a collaborative process that combines different AI platforms to generate a range of materials, which are then with subjectively integrated into Mulan's fantasy dreamscape. The main conclusion drawn from the project is that artists guide abstract concepts and provide micro-interference, while AI produces concrete components and variations. The findings suggest that AI tools have the potential to transform artistic creation modes, and collaborative art creation can result in unique and compelling artworks.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117136841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Reference Subjective Evaluation Method for Binaural Audio in 6-DOF VR Applications 六自由度虚拟现实中双耳音频的非参考主观评价方法
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00079
Zhiyu Li, Jing Wang, Hanqi Zhang, S. Hasan, Jingxin Li
{"title":"Non-Reference Subjective Evaluation Method for Binaural Audio in 6-DOF VR Applications","authors":"Zhiyu Li, Jing Wang, Hanqi Zhang, S. Hasan, Jingxin Li","doi":"10.1109/ICMEW59549.2023.00079","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00079","url":null,"abstract":"The concept of the metaverse has spurred the development of virtual reality (VR) technology. The sense of immersion and interactivity is the key feature in VR applications especially with six-degree-of-freedom (6-DoF) position tracking. This paper proposes a non-reference subjective evaluation method to test the performance of binaural audio rendering techniques for 6-DoF VR applications. The evaluation method consists of two stages: a basic audio test with stereophones, and a rendering system test with VR equipment. Subjective experiments were designed using different audio renders, and the impact of certain factors was also investigated. The experimental results demonstrate the effectiveness of the proposed evaluation method.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114172476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stabilizing the Convolution Operations for Neural Network-Based Image and Video Codecs for Machines 基于神经网络的机器图像和视频编解码器的稳定卷积运算
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00036
Honglei Zhang, N. Le, Francesco Cricri, J. Ahonen, H. R. Tavakoli
{"title":"Stabilizing the Convolution Operations for Neural Network-Based Image and Video Codecs for Machines","authors":"Honglei Zhang, N. Le, Francesco Cricri, J. Ahonen, H. R. Tavakoli","doi":"10.1109/ICMEW59549.2023.00036","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00036","url":null,"abstract":"Deep convolutional neural networks are generally trained in the floating-point number format. However, the convolution operation in the floating-point domain suffers from numerically unstable behavior due to the limitation of the precision and range of the number format. For deep convolutional neural network-based image/video codec, the instability may cause corrupted reconstructions when the decoder works in a different computing environment. This paper proposes a post-training quantization technique where the convolution operations are performed in the integer domain while other operations are in the floating-point domain. We derived the optimal scaling factors and bits allocation strategy for the input tensor and kernel weights. With the derived scaling factors, the codec can use the significant bits of the single-precision floating-point number for the convolution operations, which does not require the system to support integer operations. Experiments on a learned image codec on machine consumption show that the proposed method achieves the similar performance as the floating-point version while achieving stable behavior on different platforms.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124726381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature-Guided Machine-Centric Image Coding for Downstream Tasks 面向下游任务的特征引导机器中心图像编码
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-07-01 DOI: 10.1109/ICMEW59549.2023.00037
Sangwoon Kwak, J. Yun, Hyon‐Gon Choo, Munchurl Kim
{"title":"Feature-Guided Machine-Centric Image Coding for Downstream Tasks","authors":"Sangwoon Kwak, J. Yun, Hyon‐Gon Choo, Munchurl Kim","doi":"10.1109/ICMEW59549.2023.00037","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00037","url":null,"abstract":"Video coding, a process of compressing and decompressing digital video content, has traditionally been optimized for human visual systems by reducing its size while maintaining the human perceptual quality. However, with the remarkable progress of artificial intelligence (AI) technology, the need for machine-centric coding has rapidly been increasing in recent years. In response to these trends, international standardization organizations such as MPEG are actively working to develop and launch new standards on coding technologies for machines, called video coding for machines (VCM). In this paper, we present a novel feature-guided block-wise image blending method for image compression, which is suitable for machine applications such as object detection and segmentation. For this, we use a gradient map of the feature loss using the pretrained encoder part of a task-specific network as guide for input degradation, so that the degraded input images can be effectively compressed for machine-centric tasks. Our method is simple but effective because additional training is not required by utilizing the pretrained encoder parts of networks for targeted tasks. Experimental results show that BD-rate gains can be obtained by applying our proposed method with averages 11% and 8% for object detection and instance segmentation tasks, respectively, compared to the image anchor results of MPEG-VCM reference software v0.4.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126673969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信