t-READi：变压器驱动的鲁棒高效多模态自动驾驶推理

IF 7.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Mobile Computing Pub Date : 2024-09-18 DOI:10.1109/TMC.2024.3462437

Pengfei Hu;Yuhang Qian;Tianyue Zheng;Ang Li;Zhe Chen;Yue Gao;Xiuzhen Cheng;Jun Luo

{"title":"t-READi：变压器驱动的鲁棒高效多模态自动驾驶推理","authors":"Pengfei Hu;Yuhang Qian;Tianyue Zheng;Ang Li;Zhe Chen;Yue Gao;Xiuzhen Cheng;Jun Luo","doi":"10.1109/TMC.2024.3462437","DOIUrl":null,"url":null,"abstract":"Given the wide adoption of multimodal sensors (e.g., camera, lidar, radar) by \n<italic>autonomous vehicle</i>\ns (AVs), deep analytics to fuse their outputs for a robust perception become imperative. However, existing fusion methods often make two assumptions rarely holding in practice: i) similar data distributions for all inputs and ii) constant availability for all sensors. Because, for example, lidars have various resolutions and failures of radars may occur, such variability often results in significant performance degradation in fusion. To this end, we present t-READi, an adaptive inference system that accommodates the variability of multimodal sensory data and thus enables robust and efficient perception. t-READi identifies variation-sensitive yet \n<italic>structure-specific</i>\n model parameters; it then adapts only these parameters while keeping the rest intact. t-READi also leverages a cross-modality contrastive learning method to compensate for the loss from missing modalities. Both functions are implemented to maintain compatibility with existing multimodal deep fusion methods. The extensive experiments evidently demonstrate that compared with the status quo approaches, t-READi not only improves the average inference accuracy by more than 6% but also reduces the inference latency by almost 15× with the cost of only 5% extra memory overhead in the worst case under realistic data and modal variations.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 1","pages":"135-149"},"PeriodicalIF":7.7000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving\",\"authors\":\"Pengfei Hu;Yuhang Qian;Tianyue Zheng;Ang Li;Zhe Chen;Yue Gao;Xiuzhen Cheng;Jun Luo\",\"doi\":\"10.1109/TMC.2024.3462437\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given the wide adoption of multimodal sensors (e.g., camera, lidar, radar) by \\n<italic>autonomous vehicle</i>\\ns (AVs), deep analytics to fuse their outputs for a robust perception become imperative. However, existing fusion methods often make two assumptions rarely holding in practice: i) similar data distributions for all inputs and ii) constant availability for all sensors. Because, for example, lidars have various resolutions and failures of radars may occur, such variability often results in significant performance degradation in fusion. To this end, we present t-READi, an adaptive inference system that accommodates the variability of multimodal sensory data and thus enables robust and efficient perception. t-READi identifies variation-sensitive yet \\n<italic>structure-specific</i>\\n model parameters; it then adapts only these parameters while keeping the rest intact. t-READi also leverages a cross-modality contrastive learning method to compensate for the loss from missing modalities. Both functions are implemented to maintain compatibility with existing multimodal deep fusion methods. The extensive experiments evidently demonstrate that compared with the status quo approaches, t-READi not only improves the average inference accuracy by more than 6% but also reduces the inference latency by almost 15× with the cost of only 5% extra memory overhead in the worst case under realistic data and modal variations.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 1\",\"pages\":\"135-149\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10684049/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10684049/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

鉴于自动驾驶汽车（av）广泛采用多模态传感器（如摄像头、激光雷达、雷达），深度分析以融合其输出以获得强大的感知变得势在必行。然而，现有的融合方法通常有两个在实践中很少被采用的假设：1)所有输入的数据分布相似；2)所有传感器的可用性不变。例如，由于激光雷达具有不同的分辨率，并且可能发生雷达故障，因此这种可变性通常会导致融合性能的显著下降。为此，我们提出了t-READi，一种适应多模态感官数据可变性的自适应推理系统，从而实现鲁棒和高效的感知。t-READi识别变化敏感但结构特定的模型参数；然后，它只适应这些参数，而保持其余参数不变。t-READi还利用跨模态对比学习方法来弥补模态缺失带来的损失。实现这两个函数是为了保持与现有多模态深度融合方法的兼容性。大量的实验表明，与现有方法相比，在真实数据和模态变化的最坏情况下，t-READi不仅将平均推理精度提高了6%以上，而且将推理延迟降低了近15倍，而成本仅为5%的额外内存开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving

Given the wide adoption of multimodal sensors (e.g., camera, lidar, radar) by autonomous vehicle s (AVs), deep analytics to fuse their outputs for a robust perception become imperative. However, existing fusion methods often make two assumptions rarely holding in practice: i) similar data distributions for all inputs and ii) constant availability for all sensors. Because, for example, lidars have various resolutions and failures of radars may occur, such variability often results in significant performance degradation in fusion. To this end, we present t-READi, an adaptive inference system that accommodates the variability of multimodal sensory data and thus enables robust and efficient perception. t-READi identifies variation-sensitive yet structure-specific model parameters; it then adapts only these parameters while keeping the rest intact. t-READi also leverages a cross-modality contrastive learning method to compensate for the loss from missing modalities. Both functions are implemented to maintain compatibility with existing multimodal deep fusion methods. The extensive experiments evidently demonstrate that compared with the status quo approaches, t-READi not only improves the average inference accuracy by more than 6% but also reduces the inference latency by almost 15× with the cost of only 5% extra memory overhead in the worst case under realistic data and modal variations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.