Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

arXiv - CS - Artificial Intelligence Pub Date : 2024-09-16 DOI:arxiv-2409.10277

Hongming Zhang, Xiaoman Pan, Hongwei Wang, Kaixin Ma, Wenhao Yu, Dong Yu

{"title":"Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots","authors":"Hongming Zhang, Xiaoman Pan, Hongwei Wang, Kaixin Ma, Wenhao Yu, Dong Yu","doi":"arxiv-2409.10277","DOIUrl":null,"url":null,"abstract":"We introduce Cognitive Kernel, an open-source agent system towards the goal\nof generalist autopilots. Unlike copilot systems, which primarily rely on users\nto provide essential state information (e.g., task descriptions) and assist\nusers by answering questions or auto-completing contents, autopilot systems\nmust complete tasks from start to finish independently, which requires the\nsystem to acquire the state information from the environments actively. To\nachieve this, an autopilot system should be capable of understanding user\nintents, actively gathering necessary information from various real-world\nsources, and making wise decisions. Cognitive Kernel adopts a model-centric\ndesign. In our implementation, the central policy model (a fine-tuned LLM)\ninitiates interactions with the environment using a combination of atomic\nactions, such as opening files, clicking buttons, saving intermediate results\nto memory, or calling the LLM itself. This differs from the widely used\nenvironment-centric design, where a task-specific environment with predefined\nactions is fixed, and the policy model is limited to selecting the correct\naction from a given set of options. Our design facilitates seamless information\nflow across various sources and provides greater flexibility. We evaluate our\nsystem in three use cases: real-time information management, private\ninformation management, and long-term memory management. The results\ndemonstrate that Cognitive Kernel achieves better or comparable performance to\nother closed-source systems in these scenarios. Cognitive Kernel is fully\ndockerized, ensuring everyone can deploy it privately and securely. We\nopen-source the system and the backbone model to encourage further research on\nLLM-driven autopilot systems.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We introduce Cognitive Kernel, an open-source agent system towards the goal of generalist autopilots. Unlike copilot systems, which primarily rely on users to provide essential state information (e.g., task descriptions) and assist users by answering questions or auto-completing contents, autopilot systems must complete tasks from start to finish independently, which requires the system to acquire the state information from the environments actively. To achieve this, an autopilot system should be capable of understanding user intents, actively gathering necessary information from various real-world sources, and making wise decisions. Cognitive Kernel adopts a model-centric design. In our implementation, the central policy model (a fine-tuned LLM) initiates interactions with the environment using a combination of atomic actions, such as opening files, clicking buttons, saving intermediate results to memory, or calling the LLM itself. This differs from the widely used environment-centric design, where a task-specific environment with predefined actions is fixed, and the policy model is limited to selecting the correct action from a given set of options. Our design facilitates seamless information flow across various sources and provides greater flexibility. We evaluate our system in three use cases: real-time information management, private information management, and long-term memory management. The results demonstrate that Cognitive Kernel achieves better or comparable performance to other closed-source systems in these scenarios. Cognitive Kernel is fully dockerized, ensuring everyone can deploy it privately and securely. We open-source the system and the backbone model to encourage further research on LLM-driven autopilot systems.

查看原文本刊更多论文

认知内核：面向通用自动驾驶仪的开源代理系统

我们介绍的认知内核（Cognitive Kernel）是一个开源代理系统，旨在实现通用自动驾驶的目标。与主要依靠用户提供基本状态信息（如任务描述）并通过回答问题或自动完成内容来协助用户的副驾驶系统不同，自动驾驶系统必须自始至终独立完成任务，这就要求系统主动从环境中获取状态信息。为此，自动驾驶系统应能够理解用户的意图，主动从现实世界的各种资源中收集必要的信息，并做出明智的决策。认知内核采用了以模型为中心的设计。在我们的实现过程中，中央策略模型（经过微调的 LLM）使用原子交互组合启动与环境的交互，例如打开文件、点击按钮、将中间结果保存到内存或调用 LLM 本身。这不同于广泛使用的以环境为中心的设计，在这种设计中，具有预定义交互的特定任务环境是固定的，策略模型仅限于从一组给定的选项中选择正确的交互。我们的设计有利于跨各种来源的无缝信息流，并提供了更大的灵活性。我们在三个用例中评估了我们的系统：实时信息管理、私人信息管理和长期内存管理。结果表明，Cognitive Kernel 在这些应用场景中取得了比其他闭源系统更好或相当的性能。认知内核是完全ocker化的，确保每个人都能私密、安全地部署它。我们将系统和骨干模型开源，以鼓励对LLM驱动的自动驾驶系统的进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量