{"title":"Towards Energy-Accuracy Scalable Multimodal Cognitive Systems","authors":"Soumendu Kumar Ghosh;Arghadip Das;Arnab Raha;Vijay Raghunathan","doi":"10.1109/LES.2024.3497935","DOIUrl":null,"url":null,"abstract":"Transformer-powered multimodal artificial intelligence (MMAI) holds great promise for developing cognitive systems that can analyze and interpret data from various sensory modalities simultaneously. However, deploying MMAI on resource-constrained “edge” platforms poses significant challenges due to the intensive compute and memory requirements of transformer models, communication bandwidth limitations, real-time processing needs, and the intricacies of multimodal data fusion. To overcome these challenges, we introduce collaborative multimodal inference, leveraging the strengths of MMAI, edge computing, and cloud resources. Our solution introduces modality-aware accuracy-efficiency (AE) knobs, extending beyond multimodal sensors to individual subsystems within the edge system. We explore intersubsystem and intermodal interactions, investigating system-level AE tradeoffs in the presence of synergistic optimizations. Building on these insights, we present <monospace>SysteMMX</monospace>, the first AE scalable cognitive system for efficient multimodal inference at the edge. In this letter, we present an in-depth case study centered around a multimodal system employing RGB and depth sensors for image segmentation. Our system, <monospace>SysteMMX</monospace>, demonstrates significant energy savings—<inline-formula> <tex-math>${1.8 \\times }$ </tex-math></inline-formula> on the edge device and <inline-formula> <tex-math>${1.7\\times }$ </tex-math></inline-formula> on the edge server—with an imperceptible application-level accuracy loss of less than 0.01%. Furthermore, <monospace>SysteMMX</monospace> outperforms single-modality optimizations, achieving <inline-formula> <tex-math>${1.2\\times }$ </tex-math></inline-formula> and <inline-formula> <tex-math>${1.8\\times }$ </tex-math></inline-formula> more energy efficiency on the edge compared to RGB-only and Depth-only approaches, respectively, for similar levels of accuracy loss.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 3","pages":"156-159"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10752679/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Transformer-powered multimodal artificial intelligence (MMAI) holds great promise for developing cognitive systems that can analyze and interpret data from various sensory modalities simultaneously. However, deploying MMAI on resource-constrained “edge” platforms poses significant challenges due to the intensive compute and memory requirements of transformer models, communication bandwidth limitations, real-time processing needs, and the intricacies of multimodal data fusion. To overcome these challenges, we introduce collaborative multimodal inference, leveraging the strengths of MMAI, edge computing, and cloud resources. Our solution introduces modality-aware accuracy-efficiency (AE) knobs, extending beyond multimodal sensors to individual subsystems within the edge system. We explore intersubsystem and intermodal interactions, investigating system-level AE tradeoffs in the presence of synergistic optimizations. Building on these insights, we present SysteMMX, the first AE scalable cognitive system for efficient multimodal inference at the edge. In this letter, we present an in-depth case study centered around a multimodal system employing RGB and depth sensors for image segmentation. Our system, SysteMMX, demonstrates significant energy savings—${1.8 \times }$ on the edge device and ${1.7\times }$ on the edge server—with an imperceptible application-level accuracy loss of less than 0.01%. Furthermore, SysteMMX outperforms single-modality optimizations, achieving ${1.2\times }$ and ${1.8\times }$ more energy efficiency on the edge compared to RGB-only and Depth-only approaches, respectively, for similar levels of accuracy loss.
期刊介绍:
The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.