自动驾驶中开放词汇多模态自动标注的多智能体协作框架

IF 14.3 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Intelligent Vehicles Pub Date : 2024-09-16 DOI:10.1109/TIV.2024.3461651

Yijie Zhou;Xianhui Cheng;Qiming Zhang;Lei Wang;Wenchao Ding;Xiangyang Xue;Chunbo Luo;Jian Pu

{"title":"自动驾驶中开放词汇多模态自动标注的多智能体协作框架","authors":"Yijie Zhou;Xianhui Cheng;Qiming Zhang;Lei Wang;Wenchao Ding;Xiangyang Xue;Chunbo Luo;Jian Pu","doi":"10.1109/TIV.2024.3461651","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) have achieved impressive progress in decision-making and task automation for intelligent agents. However, multiple agents must cooperate to complete tasks in complex real-world applications, such as auto-annotating in autonomous driving. The primary challenges lie in how multiple agents effectively communicate and collaborate in a multi-modal environment and how to automatically refine annotating results to reduce human intervention. These challenges also hinder LLMs from fully evolving into embodied intelligent agents. Driven by these motivations, we propose ALGPT, a multi-agent cooperative framework for open-vocabulary multi-modal auto-annotation in autonomous driving. ALGPT dynamically assembles agent teams with different roles, and agents cooperate to complete annotation tasks according to requirements. By leveraging Chain of Thought (CoT) and In-Context Learning (ICL) techniques, ALGPT's reasoning capabilities are enhanced, allowing it to develop suitable plans autonomously without human intervention. Furthermore, drawing from project management standards, we introduce project management documents and Standard Operating Procedures (SOPs), which further align ALGPT's behavior with human expectations and mitigate the impact of GPT illusions caused by the cascading effects of multiple GPTs.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"10 5","pages":"3644-3658"},"PeriodicalIF":14.3000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ALGPT: Multi-Agent Cooperative Framework for Open-Vocabulary Multi-Modal Auto-Annotating in Autonomous Driving\",\"authors\":\"Yijie Zhou;Xianhui Cheng;Qiming Zhang;Lei Wang;Wenchao Ding;Xiangyang Xue;Chunbo Luo;Jian Pu\",\"doi\":\"10.1109/TIV.2024.3461651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLMs) have achieved impressive progress in decision-making and task automation for intelligent agents. However, multiple agents must cooperate to complete tasks in complex real-world applications, such as auto-annotating in autonomous driving. The primary challenges lie in how multiple agents effectively communicate and collaborate in a multi-modal environment and how to automatically refine annotating results to reduce human intervention. These challenges also hinder LLMs from fully evolving into embodied intelligent agents. Driven by these motivations, we propose ALGPT, a multi-agent cooperative framework for open-vocabulary multi-modal auto-annotation in autonomous driving. ALGPT dynamically assembles agent teams with different roles, and agents cooperate to complete annotation tasks according to requirements. By leveraging Chain of Thought (CoT) and In-Context Learning (ICL) techniques, ALGPT's reasoning capabilities are enhanced, allowing it to develop suitable plans autonomously without human intervention. Furthermore, drawing from project management standards, we introduce project management documents and Standard Operating Procedures (SOPs), which further align ALGPT's behavior with human expectations and mitigate the impact of GPT illusions caused by the cascading effects of multiple GPTs.\",\"PeriodicalId\":36532,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Vehicles\",\"volume\":\"10 5\",\"pages\":\"3644-3658\"},\"PeriodicalIF\":14.3000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Vehicles\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10681241/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10681241/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）在智能代理的决策和任务自动化方面取得了令人瞩目的进展。然而，在复杂的现实世界应用程序中，多个代理必须协作才能完成任务，例如自动驾驶中的自动注释。主要的挑战在于多个代理如何在多模态环境中有效地通信和协作，以及如何自动改进注释结果以减少人为干预。这些挑战也阻碍了llm完全进化为具身智能代理。在这些动机的驱动下，我们提出了一种用于自动驾驶中开放词汇多模式自动注释的多智能体合作框架ALGPT。ALGPT动态组合不同角色的代理团队，代理根据需求协同完成标注任务。通过利用思维链（CoT）和情境学习（ICL）技术，ALGPT的推理能力得到了增强，使其能够在没有人为干预的情况下自主制定合适的计划。此外，根据项目管理标准，我们引入了项目管理文件和标准操作程序（sop），进一步使ALGPT的行为与人类的期望保持一致，并减轻了由多个GPT的级联效应引起的GPT错觉的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ALGPT: Multi-Agent Cooperative Framework for Open-Vocabulary Multi-Modal Auto-Annotating in Autonomous Driving

Large Language Models (LLMs) have achieved impressive progress in decision-making and task automation for intelligent agents. However, multiple agents must cooperate to complete tasks in complex real-world applications, such as auto-annotating in autonomous driving. The primary challenges lie in how multiple agents effectively communicate and collaborate in a multi-modal environment and how to automatically refine annotating results to reduce human intervention. These challenges also hinder LLMs from fully evolving into embodied intelligent agents. Driven by these motivations, we propose ALGPT, a multi-agent cooperative framework for open-vocabulary multi-modal auto-annotation in autonomous driving. ALGPT dynamically assembles agent teams with different roles, and agents cooperate to complete annotation tasks according to requirements. By leveraging Chain of Thought (CoT) and In-Context Learning (ICL) techniques, ALGPT's reasoning capabilities are enhanced, allowing it to develop suitable plans autonomously without human intervention. Furthermore, drawing from project management standards, we introduce project management documents and Standard Operating Procedures (SOPs), which further align ALGPT's behavior with human expectations and mitigate the impact of GPT illusions caused by the cascading effects of multiple GPTs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Vehicles Mathematics-Control and Optimization

CiteScore

12.10

自引率

13.40%

发文量

177

期刊介绍： The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges. Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.