基于多模态大语言模型的图像语义通信的非分布解决方案

IEEE Transactions on Machine Learning in Communications and Networking Pub Date : 2025-08-05 DOI:10.1109/TMLCN.2025.3595841

Feifan Zhang;Yuyang Du;Kexin Chen;Yulin Shao;Soung Chang Liew

{"title":"基于多模态大语言模型的图像语义通信的非分布解决方案","authors":"Feifan Zhang;Yuyang Du;Kexin Chen;Yulin Shao;Soung Chang Liew","doi":"10.1109/TMLCN.2025.3595841","DOIUrl":null,"url":null,"abstract":"Semantic communication is a promising technology for next-generation wireless networks. However, the out-of-distribution (OOD) problem, where a pre-trained machine learning (ML) model is applied to unseen tasks that are outside the distribution of its training data, may compromise the integrity of semantic compression. This paper explores the use of multi-modal large language models (MLLMs) to address the OOD issue in image semantic communication. We propose a novel “Plan A - Plan B” framework that leverages the broad knowledge and strong generalization ability of an MLLM to assist a conventional ML model when the latter encounters an OOD input in the semantic encoding process. Furthermore, we propose a Bayesian optimization scheme that reshapes the probability distribution of the MLLM’s inference process based on the contextual information of the image. The optimization scheme significantly enhances the MLLM’s performance in semantic compression by 1) filtering out irrelevant vocabulary in the original MLLM output; and 2) using contextual similarities between prospective answers of the MLLM and the background information as prior knowledge to modify the MLLM’s probability distribution during inference. Further, at the receiver side of the communication system, we put forth a “generate-criticize” framework that utilizes the cooperation of multiple MLLMs to enhance the reliability of image reconstruction.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"3 ","pages":"997-1013"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11113346","citationCount":"0","resultStr":"{\"title\":\"Out-of-Distribution in Image Semantic Communication: A Solution With Multimodal Large Language Models\",\"authors\":\"Feifan Zhang;Yuyang Du;Kexin Chen;Yulin Shao;Soung Chang Liew\",\"doi\":\"10.1109/TMLCN.2025.3595841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic communication is a promising technology for next-generation wireless networks. However, the out-of-distribution (OOD) problem, where a pre-trained machine learning (ML) model is applied to unseen tasks that are outside the distribution of its training data, may compromise the integrity of semantic compression. This paper explores the use of multi-modal large language models (MLLMs) to address the OOD issue in image semantic communication. We propose a novel “Plan A - Plan B” framework that leverages the broad knowledge and strong generalization ability of an MLLM to assist a conventional ML model when the latter encounters an OOD input in the semantic encoding process. Furthermore, we propose a Bayesian optimization scheme that reshapes the probability distribution of the MLLM’s inference process based on the contextual information of the image. The optimization scheme significantly enhances the MLLM’s performance in semantic compression by 1) filtering out irrelevant vocabulary in the original MLLM output; and 2) using contextual similarities between prospective answers of the MLLM and the background information as prior knowledge to modify the MLLM’s probability distribution during inference. Further, at the receiver side of the communication system, we put forth a “generate-criticize” framework that utilizes the cooperation of multiple MLLMs to enhance the reliability of image reconstruction.\",\"PeriodicalId\":100641,\"journal\":{\"name\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"volume\":\"3 \",\"pages\":\"997-1013\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11113346\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11113346/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11113346/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语义通信是下一代无线网络的一项重要技术。然而，在分布外（OOD）问题中，预训练的机器学习（ML）模型应用于其训练数据分布之外的看不见的任务，可能会损害语义压缩的完整性。本文探讨了使用多模态大语言模型（mllm）来解决图像语义通信中的OOD问题。我们提出了一种新颖的“计划a -计划B”框架，该框架利用MLLM的广博知识和强大的泛化能力，在传统ML模型在语义编码过程中遇到OOD输入时对其进行辅助。此外，我们提出了一种贝叶斯优化方案，该方案基于图像的上下文信息重塑MLLM推理过程的概率分布。该优化方案通过1)过滤掉原始MLLM输出中的不相关词汇，显著提高了MLLM在语义压缩方面的性能；2)利用MLLM的预期答案与背景信息之间的上下文相似性作为先验知识，修改MLLM在推理过程中的概率分布。此外，在通信系统的接收端，我们提出了一种“生成-批评”框架，利用多个mllm的合作来提高图像重建的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Out-of-Distribution in Image Semantic Communication: A Solution With Multimodal Large Language Models

Semantic communication is a promising technology for next-generation wireless networks. However, the out-of-distribution (OOD) problem, where a pre-trained machine learning (ML) model is applied to unseen tasks that are outside the distribution of its training data, may compromise the integrity of semantic compression. This paper explores the use of multi-modal large language models (MLLMs) to address the OOD issue in image semantic communication. We propose a novel “Plan A - Plan B” framework that leverages the broad knowledge and strong generalization ability of an MLLM to assist a conventional ML model when the latter encounters an OOD input in the semantic encoding process. Furthermore, we propose a Bayesian optimization scheme that reshapes the probability distribution of the MLLM’s inference process based on the contextual information of the image. The optimization scheme significantly enhances the MLLM’s performance in semantic compression by 1) filtering out irrelevant vocabulary in the original MLLM output; and 2) using contextual similarities between prospective answers of the MLLM and the background information as prior knowledge to modify the MLLM’s probability distribution during inference. Further, at the receiver side of the communication system, we put forth a “generate-criticize” framework that utilizes the cooperation of multiple MLLMs to enhance the reliability of image reconstruction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Machine Learning in Communications and Networking

自引率

0.00%

发文量