Towards Energy-Accuracy Scalable Multimodal Cognitive Systems

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Embedded Systems Letters Pub Date : 2024-11-13 DOI:10.1109/LES.2024.3497935

Soumendu Kumar Ghosh;Arghadip Das;Arnab Raha;Vijay Raghunathan

{"title":"Towards Energy-Accuracy Scalable Multimodal Cognitive Systems","authors":"Soumendu Kumar Ghosh;Arghadip Das;Arnab Raha;Vijay Raghunathan","doi":"10.1109/LES.2024.3497935","DOIUrl":null,"url":null,"abstract":"Transformer-powered multimodal artificial intelligence (MMAI) holds great promise for developing cognitive systems that can analyze and interpret data from various sensory modalities simultaneously. However, deploying MMAI on resource-constrained “edge” platforms poses significant challenges due to the intensive compute and memory requirements of transformer models, communication bandwidth limitations, real-time processing needs, and the intricacies of multimodal data fusion. To overcome these challenges, we introduce collaborative multimodal inference, leveraging the strengths of MMAI, edge computing, and cloud resources. Our solution introduces modality-aware accuracy-efficiency (AE) knobs, extending beyond multimodal sensors to individual subsystems within the edge system. We explore intersubsystem and intermodal interactions, investigating system-level AE tradeoffs in the presence of synergistic optimizations. Building on these insights, we present <monospace>SysteMMX</monospace>, the first AE scalable cognitive system for efficient multimodal inference at the edge. In this letter, we present an in-depth case study centered around a multimodal system employing RGB and depth sensors for image segmentation. Our system, <monospace>SysteMMX</monospace>, demonstrates significant energy savings—<inline-formula> <tex-math>${1.8 \\times }$ </tex-math></inline-formula> on the edge device and <inline-formula> <tex-math>${1.7\\times }$ </tex-math></inline-formula> on the edge server—with an imperceptible application-level accuracy loss of less than 0.01%. Furthermore, <monospace>SysteMMX</monospace> outperforms single-modality optimizations, achieving <inline-formula> <tex-math>${1.2\\times }$ </tex-math></inline-formula> and <inline-formula> <tex-math>${1.8\\times }$ </tex-math></inline-formula> more energy efficiency on the edge compared to RGB-only and Depth-only approaches, respectively, for similar levels of accuracy loss.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 3","pages":"156-159"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10752679/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Transformer-powered multimodal artificial intelligence (MMAI) holds great promise for developing cognitive systems that can analyze and interpret data from various sensory modalities simultaneously. However, deploying MMAI on resource-constrained “edge” platforms poses significant challenges due to the intensive compute and memory requirements of transformer models, communication bandwidth limitations, real-time processing needs, and the intricacies of multimodal data fusion. To overcome these challenges, we introduce collaborative multimodal inference, leveraging the strengths of MMAI, edge computing, and cloud resources. Our solution introduces modality-aware accuracy-efficiency (AE) knobs, extending beyond multimodal sensors to individual subsystems within the edge system. We explore intersubsystem and intermodal interactions, investigating system-level AE tradeoffs in the presence of synergistic optimizations. Building on these insights, we present SysteMMX, the first AE scalable cognitive system for efficient multimodal inference at the edge. In this letter, we present an in-depth case study centered around a multimodal system employing RGB and depth sensors for image segmentation. Our system, SysteMMX, demonstrates significant energy savings—

${1.8 \times }$

on the edge device and

${1.7\times }$

on the edge server—with an imperceptible application-level accuracy loss of less than 0.01%. Furthermore, SysteMMX outperforms single-modality optimizations, achieving

${1.2\times }$

and

${1.8\times }$

more energy efficiency on the edge compared to RGB-only and Depth-only approaches, respectively, for similar levels of accuracy loss.

查看原文本刊更多论文

面向能量-精度可扩展多模态认知系统

变压器驱动的多模态人工智能（MMAI）在开发能够同时分析和解释来自各种感官模态的数据的认知系统方面具有很大的前景。然而，由于变压器模型的密集计算和内存需求、通信带宽限制、实时处理需求以及多模态数据融合的复杂性，在资源受限的“边缘”平台上部署MMAI面临着重大挑战。为了克服这些挑战，我们引入了协作式多模态推理，利用了MMAI、边缘计算和云资源的优势。我们的解决方案引入了模态感知精度效率（AE）旋钮，将多模态传感器扩展到边缘系统内的各个子系统。我们探索子系统间和多式联运的相互作用，在协同优化的存在下研究系统级AE权衡。基于这些见解，我们提出了SysteMMX，这是第一个用于边缘高效多模态推理的AE可扩展认知系统。在这封信中，我们提出了一个深入的案例研究，围绕一个多模式系统，采用RGB和深度传感器进行图像分割。我们的系统SysteMMX展示了显著的节能效果——在边缘设备上节省了${1.8 \times}$，在边缘服务器上节省了${1.7\times}$，而应用程序级的精度损失小于0.01%。此外，SysteMMX优于单模态优化，在相似的精度损失水平下，与仅rgb和仅depth方法相比，在边缘上分别实现了${1.2\倍}$和${1.8\倍}$的能源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Embedded Systems Letters Engineering-Control and Systems Engineering

CiteScore

3.30

自引率

0.00%

发文量

期刊介绍： The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.