2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献

筛选
英文 中文
Cheap-Fake Detection with LLM Using Prompt Engineering 使用提示工程的LLM廉价假冒检测
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-06-05 DOI: 10.1109/ICMEW59549.2023.00025
Guangyang Wu, Weijie Wu, Xiaohong Liu, Kele Xu, Tianjiao Wan, Wenyi Wang
{"title":"Cheap-Fake Detection with LLM Using Prompt Engineering","authors":"Guangyang Wu, Weijie Wu, Xiaohong Liu, Kele Xu, Tianjiao Wan, Wenyi Wang","doi":"10.1109/ICMEW59549.2023.00025","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00025","url":null,"abstract":"The misuse of real photographs with conflicting image captions in news items is an example of the out-of-context (OOC) misuse of media. In order to detect OOC media, individuals must determine the accuracy of the statement and evaluate whether the triplet (i.e., the image and two captions) relates to the same event. This paper presents a novel learnable approach for detecting OOC media in ICME'23 Grand Challenge on Detecting Cheapfakes. The proposed method is based on the COSMOS structure, which assesses the coherence between an image and captions, as well as between two captions. We enhance the baseline algorithm by incorporating a Large Language Model (LLM), GPT3.5, as a feature extractor. Specifically, we propose an innovative approach to feature extraction utilizing prompt engineering to develop a robust and reliable feature extractor with GPT3.5 model. The proposed method captures the correlation between two captions and effectively integrates this module into the COSMOS baseline model, which allows for a deeper understanding of the relationship between captions. By incorporating this module, we demonstrate the potential for significant improvements in cheap-fakes detection performance. The proposed methodology holds promising implications for various applications such as natural language processing, image captioning, and text-to-image synthesis. Docker for submission is available at https://hub.docker.com/repository/docker/mulns/acmmmcheapfakes.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126211756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Half Title Page 半页标题
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-05-20 DOI: 10.1109/icmew59549.2023.00001
{"title":"Half Title Page","authors":"","doi":"10.1109/icmew59549.2023.00001","DOIUrl":"https://doi.org/10.1109/icmew59549.2023.00001","url":null,"abstract":"","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130097309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Supervised Federated Learning for Keyword Spotting 关键词识别的半监督联邦学习
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-05-09 DOI: 10.1109/ICMEW59549.2023.00087
Enmao Diao, Eric W. Tramel, Jie Ding, Tao Zhang
{"title":"Semi-Supervised Federated Learning for Keyword Spotting","authors":"Enmao Diao, Eric W. Tramel, Jie Ding, Tao Zhang","doi":"10.1109/ICMEW59549.2023.00087","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00087","url":null,"abstract":"Keyword Spotting (KWS) is a critical aspect of audio-based applications on mobile devices and virtual assistants. Recent developments in Federated Learning (FL) have significantly expanded the ability to train machine learning models by utilizing the computational and private data resources of numerous distributed devices. However, existing FL methods typically require that devices possess accurate ground-truth labels, which can be both expensive and impractical when dealing with local audio data. In this study, we first demonstrate the effectiveness of Semi-Supervised Federated Learning (SSL) and FL for KWS. We then extend our investigation to Semi-Supervised Federated Learning (SSFL) for KWS, where devices possess completely unlabeled data, while the server has access to a small amount of labeled data. We perform numerical analyses using state-of-the-art SSL, FL, and SSFL techniques to demonstrate that the performance of KWS models can be significantly improved by leveraging the abundant unlabeled heterogeneous data available on devices.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127499651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompt What You Need: Enhancing Segmentation in Rainy Scenes with Anchor-Based Prompting 提示你需要什么:增强分割在下雨的场景与锚为基础的提示
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-05-06 DOI: 10.1109/ICMEW59549.2023.00019
Xiaoyuan Guo, Xiang Wei, Q. Su, Hui-Huang Zhao, Shunli Zhan
{"title":"Prompt What You Need: Enhancing Segmentation in Rainy Scenes with Anchor-Based Prompting","authors":"Xiaoyuan Guo, Xiang Wei, Q. Su, Hui-Huang Zhao, Shunli Zhan","doi":"10.1109/ICMEW59549.2023.00019","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00019","url":null,"abstract":"Semantic segmentation in rainy scenes is a challenging task due to the complex environment, class distribution imbalance, and limited annotated data. To address these challenges, we propose a novel framework that utilizes semi-supervised learning and pre-trained segmentation foundation model to achieve superior performance. Specifically, our framework leverages the semi-supervised model as the basis for generating raw semantic segmentation results, while also serving as a guiding force to prompt pre-trained foundation model to compensate for knowledge gaps with entropy-based anchors. In addition, to minimize the impact of irrelevant segmentation masks generated by the pre-trained foundation model, we also propose a mask filtering and fusion mechanism that optimizes raw semantic segmentation results based on the principle of minimum risk. The proposed framework achieves superior segmentation performance on the Rainy WCity dataset and is awarded the first prize in the subtrack of STRAIN in ICME 2023 Grand Challenges.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134099555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learn How to Prune Pixels for Multi-View Neural Image-Based Synthesis 学习如何为基于多视图的神经图像合成修剪像素
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-05-05 DOI: 10.1109/ICMEW59549.2023.00034
Marta Milovanovi'c, Enzo Tartaglione, Marco Cagnazzo, F. Henry
{"title":"Learn How to Prune Pixels for Multi-View Neural Image-Based Synthesis","authors":"Marta Milovanovi'c, Enzo Tartaglione, Marco Cagnazzo, F. Henry","doi":"10.1109/ICMEW59549.2023.00034","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00034","url":null,"abstract":"Image-based rendering techniques stand at the core of an immersive experience for the user, as they generate novel views given a set of multiple input images. Since they have shown good performance in terms of objective and subjective quality, the research community devotes great effort to their improvement. However, the large volume of data necessary to render at the receiver's side hinders applications in limited bandwidth environments or prevents their employment in real-time applications. We present LeHoPP, a method for input pixel pruning, where we examine the importance of each input pixel concerning the rendered view, and we avoid the use of irrelevant pixels. Even without retraining the image-based rendering network, our approach shows a good tradeoff between synthesis quality and pixel rate. When tested in the general neural rendering framework, compared to other pruning baselines, LeHoPP gains between 0.9 dB and 3.6 dB on average.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124872811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional and Residual Methods in Scalable Coding for Humans and Machines 人类和机器可扩展编码中的条件和残差方法
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-05-04 DOI: 10.1109/ICMEW59549.2023.00040
Anderson de Andrade, Alon Harell, Yalda Foroutan, Ivan V. Baji'c
{"title":"Conditional and Residual Methods in Scalable Coding for Humans and Machines","authors":"Anderson de Andrade, Alon Harell, Yalda Foroutan, Ivan V. Baji'c","doi":"10.1109/ICMEW59549.2023.00040","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00040","url":null,"abstract":"We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting ratedistortion curves contained within our baselines.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129528060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation 利用变压器中的感应偏置进行点云分类和分割
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-04-27 DOI: 10.1109/ICMEW59549.2023.00031
Zihao Li, Pan Gao, Hui Yuan, Ran Wei, M. Paul
{"title":"Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation","authors":"Zihao Li, Pan Gao, Hui Yuan, Ran Wei, M. Paul","doi":"10.1109/ICMEW59549.2023.00031","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00031","url":null,"abstract":"Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126506126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Order-Complexity Model for Aesthetic Quality Assessment of Homophony Music Performance 谐音音乐演奏审美质量评价的序复杂度模型
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-04-23 DOI: 10.1109/ICMEW59549.2023.00061
Xin Jin, Wu Zhou, Jinyu Wang, Duo Xu, Yiqing Rong, Jialin Sun
{"title":"An Order-Complexity Model for Aesthetic Quality Assessment of Homophony Music Performance","authors":"Xin Jin, Wu Zhou, Jinyu Wang, Duo Xu, Yiqing Rong, Jialin Sun","doi":"10.1109/ICMEW59549.2023.00061","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00061","url":null,"abstract":"Although computational aesthetics evaluation has made certain achievements in many fields, its research of music performance remains to be explored. At present, subjective evaluation is still a ultimate method of music aesthetics research, but it will consume a lot of human and material resources. In addition, the music performance generated by AI is still mechanical, monotonous and lacking in beauty. In order to guide the generation task of AI music performance, and to improve the performance effect of human performers, this paper uses Birkhoff's aesthetic measure to propose a method of objective measurement of beauty. The main contributions of this paper are as follows: Firstly, we put forward an objective aesthetic evaluation method to measure the music performance aesthetic; Secondly, we propose 10 basic music features and 4 aesthetic music features. Experiments show that our method performs well on performance assessment.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125892193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XGC-VQA: A Unified Video Quality Assessment Model for User, Professionally, and Occupationally-Generated Content XGC-VQA:用户、专业和职业生成内容的统一视频质量评估模型
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-03-24 DOI: 10.1109/ICMEW59549.2023.00081
Xinhui Huang, Chunyi Li, A. Bentaleb, Roger Zimmermann, Guangtao Zhai
{"title":"XGC-VQA: A Unified Video Quality Assessment Model for User, Professionally, and Occupationally-Generated Content","authors":"Xinhui Huang, Chunyi Li, A. Bentaleb, Roger Zimmermann, Guangtao Zhai","doi":"10.1109/ICMEW59549.2023.00081","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00081","url":null,"abstract":"With the rapid growth of Internet video data amounts and types, a unified Video Quality Assessment (VQA) is needed to inspire video communication with perceptual quality. To meet the real-time and universal requirements in providing such inspiration, this study proposes a VQA model from a classification of User Generated Content (UGC), Professionally Generated Content (PGC), and Occupationally Generated Content (OGC). In the time domain, this study utilizes non-uniform sampling, as each content type has varying temporal importance based on its perceptual quality. In the spatial domain, centralized downsampling is performed before the VQA process by utilizing a patch splicing/sampling mechanism to lower complexity for real-time assessment. The experimental results demonstrate that the proposed method achieves a median correlation of 0.7 while limiting the computation time below 5s for three content types, which ensures that the communication experience of UGC, PGC, and OGC can be optimized altogether.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127445568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Perceptual Quality Assessment Exploration for AIGC Images AIGC图像的感知质量评价探索
2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Pub Date : 2023-03-22 DOI: 10.1109/ICMEW59549.2023.00082
Zicheng Zhang, Chunyi Li, Wei Sun, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai
{"title":"A Perceptual Quality Assessment Exploration for AIGC Images","authors":"Zicheng Zhang, Chunyi Li, Wei Sun, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai","doi":"10.1109/ICMEW59549.2023.00082","DOIUrl":"https://doi.org/10.1109/ICMEW59549.2023.00082","url":null,"abstract":"AI Generated Content (AIGC) has gained widespread attention with the increasing efficiency of deep learning in content creation. AIGC, created with the assistance of artificial intelligence technology, includes various forms of content, among which the AI-generated images (AGIs) have brought significant impact to society and have been applied to various fields such as entertainment, education, social media, etc. However, due to hardware limitations and technical proficiency, the quality of AIGC images (AGIs) varies, necessitating refinement and filtering before practical use. Consequently, there is an urgent need for developing objective models to assess the quality of AGIs. Unfortunately, no research has been carried out to investigate the perceptual quality assessment for AGIs specifically. Therefore, in this paper, we first discuss the major evaluation aspects such as technical issues, AI artifacts, unnaturalness, discrepancy, and aesthetics for AGI quality assessment. Then we present the first perceptual AGI quality assessment database, AGIQA-1K, which consists of 1,080 AGIs generated from diffusion models. A well-organized subjective experiment is followed to collect the quality labels of the AGIs. Finally, we conduct a benchmark experiment to evaluate the performance of current image quality assessment (IQA) models. The database is released on https://github.com/lcysyzxdxc/AGIQA-1k-Database.","PeriodicalId":111482,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125937237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信