基于多模态模型的大型车辆网络中面向任务的语义通信

IF 9.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Baoxia Du;Hongyang Du;Dusit Niyato;Ruidong Li
{"title":"基于多模态模型的大型车辆网络中面向任务的语义通信","authors":"Baoxia Du;Hongyang Du;Dusit Niyato;Ruidong Li","doi":"10.1109/TMC.2025.3564543","DOIUrl":null,"url":null,"abstract":"Task-oriented semantic communication has emerged as a fundamental approach for enhancing performance in various communication scenarios. While recent advances in Generative Artificial Intelligence (GenAI), such as Large Language Models (LLMs), have been applied to semantic communication designs, the potential of Large Multimodal Models (LMMs) remains largely unexplored. In this paper, we investigate an LMM-based vehicle AI assistant using a Large Language and Vision Assistant (LLaVA) and propose a task-oriented semantic communication framework to facilitate efficient interaction between users and cloud servers. To reduce computational demands and shorten response time, we optimize LLaVA's image slicing to selectively focus on areas of utmost interest to users. Additionally, we assess the importance of image patches by combining objective and subjective user attention, adjusting energy usage for transmitting semantic information. This strategy optimizes resource utilization, ensuring precise transmission of critical information. We construct a Visual Question Answering (VQA) dataset for traffic scenarios to evaluate effectiveness. Experimental results show that our semantic communication framework significantly increases accuracy in answering questions under the same channel conditions, performing particularly well in environments with poor Signal-to-Noise Ratios (SNR). Accuracy can be improved by 13.4% at an SNR of 12 dB and 33.1% at 10 dB, respectively.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 10","pages":"9822-9836"},"PeriodicalIF":9.2000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Task-Oriented Semantic Communication in Large Multimodal Models-Based Vehicle Networks\",\"authors\":\"Baoxia Du;Hongyang Du;Dusit Niyato;Ruidong Li\",\"doi\":\"10.1109/TMC.2025.3564543\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Task-oriented semantic communication has emerged as a fundamental approach for enhancing performance in various communication scenarios. While recent advances in Generative Artificial Intelligence (GenAI), such as Large Language Models (LLMs), have been applied to semantic communication designs, the potential of Large Multimodal Models (LMMs) remains largely unexplored. In this paper, we investigate an LMM-based vehicle AI assistant using a Large Language and Vision Assistant (LLaVA) and propose a task-oriented semantic communication framework to facilitate efficient interaction between users and cloud servers. To reduce computational demands and shorten response time, we optimize LLaVA's image slicing to selectively focus on areas of utmost interest to users. Additionally, we assess the importance of image patches by combining objective and subjective user attention, adjusting energy usage for transmitting semantic information. This strategy optimizes resource utilization, ensuring precise transmission of critical information. We construct a Visual Question Answering (VQA) dataset for traffic scenarios to evaluate effectiveness. Experimental results show that our semantic communication framework significantly increases accuracy in answering questions under the same channel conditions, performing particularly well in environments with poor Signal-to-Noise Ratios (SNR). Accuracy can be improved by 13.4% at an SNR of 12 dB and 33.1% at 10 dB, respectively.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 10\",\"pages\":\"9822-9836\"},\"PeriodicalIF\":9.2000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10976624/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10976624/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

面向任务的语义通信已经成为在各种通信场景中提高性能的基本方法。虽然生成式人工智能(GenAI)的最新进展,如大型语言模型(llm),已被应用于语义通信设计,但大型多模态模型(lmm)的潜力在很大程度上仍未被探索。在本文中,我们使用大型语言和视觉助理(LLaVA)研究了基于lmm的车辆人工智能助手,并提出了一个面向任务的语义通信框架,以促进用户和云服务器之间的高效交互。为了减少计算需求和缩短响应时间,我们优化了LLaVA的图像切片,以选择性地关注用户最感兴趣的区域。此外,我们通过结合客观和主观用户注意力,调整传输语义信息的能量使用来评估图像补丁的重要性。这种策略优化了资源利用,确保了关键信息的准确传递。我们构建了一个可视化问答(VQA)数据集来评估交通场景的有效性。实验结果表明,我们的语义通信框架在相同信道条件下显著提高了回答问题的准确性,在信噪比(SNR)较差的环境中表现尤为出色。在信噪比为12 dB和10 dB时,精度分别提高13.4%和33.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Task-Oriented Semantic Communication in Large Multimodal Models-Based Vehicle Networks
Task-oriented semantic communication has emerged as a fundamental approach for enhancing performance in various communication scenarios. While recent advances in Generative Artificial Intelligence (GenAI), such as Large Language Models (LLMs), have been applied to semantic communication designs, the potential of Large Multimodal Models (LMMs) remains largely unexplored. In this paper, we investigate an LMM-based vehicle AI assistant using a Large Language and Vision Assistant (LLaVA) and propose a task-oriented semantic communication framework to facilitate efficient interaction between users and cloud servers. To reduce computational demands and shorten response time, we optimize LLaVA's image slicing to selectively focus on areas of utmost interest to users. Additionally, we assess the importance of image patches by combining objective and subjective user attention, adjusting energy usage for transmitting semantic information. This strategy optimizes resource utilization, ensuring precise transmission of critical information. We construct a Visual Question Answering (VQA) dataset for traffic scenarios to evaluate effectiveness. Experimental results show that our semantic communication framework significantly increases accuracy in answering questions under the same channel conditions, performing particularly well in environments with poor Signal-to-Noise Ratios (SNR). Accuracy can be improved by 13.4% at an SNR of 12 dB and 33.1% at 10 dB, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Mobile Computing
IEEE Transactions on Mobile Computing 工程技术-电信学
CiteScore
12.90
自引率
2.50%
发文量
403
审稿时长
6.6 months
期刊介绍: IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信