面向三维目标检测的最优混合专家系统：精度、效率和适应性的博弈。

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-22 DOI:10.1109/tpami.2025.3611795

Linshen Liu,Pu Wang,Guanlin Wu,Junyue Jiang,Hao Yang

{"title":"面向三维目标检测的最优混合专家系统：精度、效率和适应性的博弈。","authors":"Linshen Liu,Pu Wang,Guanlin Wu,Junyue Jiang,Hao Yang","doi":"10.1109/tpami.2025.3611795","DOIUrl":null,"url":null,"abstract":"Autonomous vehicles, open-world robots, and other automated systems rely on accurate, efficient perception modules for real-time object detection. Although high-precision models improve reliability, their processing time and computational overhead can hinder real-time performance and raise safety concerns. This paper introduces an Edge-based Mixture-of-Experts Optimal Sensing (EMOS) System that addresses the challenge of co-achieving accuracy, latency and scene adaptivity, further demonstrated in the open-world autonomous driving scenarios. Algorithmically, EMOS fuses multimodal sensor streams via an Adaptive Multimodal Data Bridge and uses a scenario-aware MoE switch to activate only a complementary set of specialized experts as needed. The proposed hierarchical backpropagation and a multiscale pooling layer let model capacity scale with real-world demand complexity. System-wise, an edge-optimized runtime with accelerator-aware scheduling (e.g., ONNX/TensorRT), zero-copy buffering, and overlapped I/O-compute enforces explicit latency/accuracy budgets across diverse driving conditions. Experimental results establish EMOS as the new state of the art: on KITTI, it increases average AP by 3.17% while running $2.6\\times$ faster on Nvidia Jetson. On nuScenes, it improves accuracy by 0.2% mAP and 0.5% NDS, with 34% fewer parameters and a $15.35\\times$ Nvidia Jetson speedup. Leveraging multimodal data and intelligent experts cooperation, EMOS delivers accurate, efficient and edge-adaptive perception system for autonomous vehicles, thereby ensuring robust, timely responses in real-world scenarios.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"87 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Optimal Mixture of Experts System for 3D Object Detection: A Game of Accuracy, Efficiency and Adaptivity.\",\"authors\":\"Linshen Liu,Pu Wang,Guanlin Wu,Junyue Jiang,Hao Yang\",\"doi\":\"10.1109/tpami.2025.3611795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Autonomous vehicles, open-world robots, and other automated systems rely on accurate, efficient perception modules for real-time object detection. Although high-precision models improve reliability, their processing time and computational overhead can hinder real-time performance and raise safety concerns. This paper introduces an Edge-based Mixture-of-Experts Optimal Sensing (EMOS) System that addresses the challenge of co-achieving accuracy, latency and scene adaptivity, further demonstrated in the open-world autonomous driving scenarios. Algorithmically, EMOS fuses multimodal sensor streams via an Adaptive Multimodal Data Bridge and uses a scenario-aware MoE switch to activate only a complementary set of specialized experts as needed. The proposed hierarchical backpropagation and a multiscale pooling layer let model capacity scale with real-world demand complexity. System-wise, an edge-optimized runtime with accelerator-aware scheduling (e.g., ONNX/TensorRT), zero-copy buffering, and overlapped I/O-compute enforces explicit latency/accuracy budgets across diverse driving conditions. Experimental results establish EMOS as the new state of the art: on KITTI, it increases average AP by 3.17% while running $2.6\\\\times$ faster on Nvidia Jetson. On nuScenes, it improves accuracy by 0.2% mAP and 0.5% NDS, with 34% fewer parameters and a $15.35\\\\times$ Nvidia Jetson speedup. Leveraging multimodal data and intelligent experts cooperation, EMOS delivers accurate, efficient and edge-adaptive perception system for autonomous vehicles, thereby ensuring robust, timely responses in real-world scenarios.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"87 1\",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3611795\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3611795","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

自动驾驶汽车、开放世界机器人和其他自动化系统依赖于精确、高效的感知模块来进行实时目标检测。虽然高精度模型提高了可靠性，但它们的处理时间和计算开销可能会阻碍实时性能并引起安全问题。本文介绍了一种基于边缘的专家混合最优感知（EMOS）系统，该系统解决了共同实现准确性、延迟和场景适应性的挑战，并在开放世界自动驾驶场景中得到了进一步证明。从算法上讲，EMOS通过自适应多模态数据桥融合多模态传感器流，并使用场景感知MoE开关，根据需要仅激活一组补充的专业专家。所提出的分层反向传播和多尺度池化层使模型的容量随实际需求的复杂性而扩展。在系统方面，边缘优化的运行时具有加速器感知调度（例如，ONNX/TensorRT），零拷贝缓冲和重叠I/ o计算，可以在不同的驾驶条件下强制执行显式的延迟/精度预算。实验结果表明EMOS是最新的技术：在KITTI上，它将平均AP提高了3.17%，而在Nvidia Jetson上运行速度提高了2.6倍。在nuScenes上，它的准确率提高了0.2% mAP和0.5% NDS，参数减少了34%，加速速度是Nvidia Jetson的15.35倍。EMOS利用多模式数据和智能专家合作，为自动驾驶汽车提供准确、高效和边缘自适应的感知系统，从而确保在现实场景中做出强大、及时的响应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Optimal Mixture of Experts System for 3D Object Detection: A Game of Accuracy, Efficiency and Adaptivity.

Autonomous vehicles, open-world robots, and other automated systems rely on accurate, efficient perception modules for real-time object detection. Although high-precision models improve reliability, their processing time and computational overhead can hinder real-time performance and raise safety concerns. This paper introduces an Edge-based Mixture-of-Experts Optimal Sensing (EMOS) System that addresses the challenge of co-achieving accuracy, latency and scene adaptivity, further demonstrated in the open-world autonomous driving scenarios. Algorithmically, EMOS fuses multimodal sensor streams via an Adaptive Multimodal Data Bridge and uses a scenario-aware MoE switch to activate only a complementary set of specialized experts as needed. The proposed hierarchical backpropagation and a multiscale pooling layer let model capacity scale with real-world demand complexity. System-wise, an edge-optimized runtime with accelerator-aware scheduling (e.g., ONNX/TensorRT), zero-copy buffering, and overlapped I/O-compute enforces explicit latency/accuracy budgets across diverse driving conditions. Experimental results establish EMOS as the new state of the art: on KITTI, it increases average AP by 3.17% while running $2.6\times$ faster on Nvidia Jetson. On nuScenes, it improves accuracy by 0.2% mAP and 0.5% NDS, with 34% fewer parameters and a $15.35\times$ Nvidia Jetson speedup. Leveraging multimodal data and intelligent experts cooperation, EMOS delivers accurate, efficient and edge-adaptive perception system for autonomous vehicles, thereby ensuring robust, timely responses in real-world scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.