异质觅食群可以更好。

IF 2.9 Q2 ROBOTICS
Frontiers in Robotics and AI Pub Date : 2025-01-20 eCollection Date: 2024-01-01 DOI:10.3389/frobt.2024.1426282
Gal A Kaminka, Yinon Douchan
{"title":"异质觅食群可以更好。","authors":"Gal A Kaminka, Yinon Douchan","doi":"10.3389/frobt.2024.1426282","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Inspired by natural phenomena, generations of researchers have been investigating how a swarm of robots can act coherently and purposefully, when individual robots can only sense and communicate with nearby peers, with no means of global communications and coordination. In this paper, we will show that swarms can perform better, when they self-adapt to admit heterogeneous behavior roles.</p><p><strong>Methods: </strong>We model a foraging swarm task as an extensive-form fully-cooperative game, in which the swarm reward is an additive function of individual contributions (the sum of collected items). To maximize the swarm reward, previous work proposed using distributed reinforcement learning, where each robot adapts its own collision-avoidance decisions based on the Effectiveness Index reward (<i>EI</i>). <i>EI</i> uses information about the time between their own collisions (information readily available even to simple physical robots). While promising, the use of <i>EI</i> is brittle (as we show), since robots that selfishly seek to optimize their own <i>EI</i> (minimizing time spent on collisions) can actually cause swarm-wide performance to degrade.</p><p><strong>Results: </strong>To address this, we derive a reward function from a game-theoretic view of swarm foraging as a fully-cooperative, unknown horizon repeating game. We demonstrate analytically that the total coordination overhead of the swarm (total time spent on collision-avoidance, rather than foraging per-se) is directly tied to the total utility of the swarm: less overhead, more items collected. Treating every collision as a stage in the repeating game, the overhead is bounded by the total <i>EI</i> of all robots. We then use a marginal-contribution (difference-reward) formulation to derive individual rewards from the total <i>EI</i>. The resulting Aligned Effective Index <math><mrow><mo>(</mo> <mrow><mi>A</mi> <mi>E</mi> <mi>I</mi></mrow> <mo>)</mo></mrow> </math> reward has the property that each individual can estimate the impact of its decisions on the swarm: individual improvements translate to swarm improvements. We show that <math><mrow><mi>A</mi> <mi>E</mi> <mi>I</mi></mrow> </math> provably generalizes previous work, adding a component that computes the effect of counterfactual robot absence. Different assumptions on this counterfactual lead to bounds on <math><mrow><mi>A</mi> <mi>E</mi> <mi>I</mi></mrow> </math> from above and below.</p><p><strong>Discussion: </strong>While the theoretical analysis clarifies both assumptions and gaps with respect to the reality of robots, experiments with real and simulated robots empirically demonstrate the efficacy of the approach in practice, and the importance of behavioral (decision-making) diversity in optimizing swarm goals.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"11 ","pages":"1426282"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11788533/pdf/","citationCount":"0","resultStr":"{\"title\":\"Heterogeneous foraging swarms can be better.\",\"authors\":\"Gal A Kaminka, Yinon Douchan\",\"doi\":\"10.3389/frobt.2024.1426282\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Inspired by natural phenomena, generations of researchers have been investigating how a swarm of robots can act coherently and purposefully, when individual robots can only sense and communicate with nearby peers, with no means of global communications and coordination. In this paper, we will show that swarms can perform better, when they self-adapt to admit heterogeneous behavior roles.</p><p><strong>Methods: </strong>We model a foraging swarm task as an extensive-form fully-cooperative game, in which the swarm reward is an additive function of individual contributions (the sum of collected items). To maximize the swarm reward, previous work proposed using distributed reinforcement learning, where each robot adapts its own collision-avoidance decisions based on the Effectiveness Index reward (<i>EI</i>). <i>EI</i> uses information about the time between their own collisions (information readily available even to simple physical robots). While promising, the use of <i>EI</i> is brittle (as we show), since robots that selfishly seek to optimize their own <i>EI</i> (minimizing time spent on collisions) can actually cause swarm-wide performance to degrade.</p><p><strong>Results: </strong>To address this, we derive a reward function from a game-theoretic view of swarm foraging as a fully-cooperative, unknown horizon repeating game. We demonstrate analytically that the total coordination overhead of the swarm (total time spent on collision-avoidance, rather than foraging per-se) is directly tied to the total utility of the swarm: less overhead, more items collected. Treating every collision as a stage in the repeating game, the overhead is bounded by the total <i>EI</i> of all robots. We then use a marginal-contribution (difference-reward) formulation to derive individual rewards from the total <i>EI</i>. The resulting Aligned Effective Index <math><mrow><mo>(</mo> <mrow><mi>A</mi> <mi>E</mi> <mi>I</mi></mrow> <mo>)</mo></mrow> </math> reward has the property that each individual can estimate the impact of its decisions on the swarm: individual improvements translate to swarm improvements. We show that <math><mrow><mi>A</mi> <mi>E</mi> <mi>I</mi></mrow> </math> provably generalizes previous work, adding a component that computes the effect of counterfactual robot absence. Different assumptions on this counterfactual lead to bounds on <math><mrow><mi>A</mi> <mi>E</mi> <mi>I</mi></mrow> </math> from above and below.</p><p><strong>Discussion: </strong>While the theoretical analysis clarifies both assumptions and gaps with respect to the reality of robots, experiments with real and simulated robots empirically demonstrate the efficacy of the approach in practice, and the importance of behavioral (decision-making) diversity in optimizing swarm goals.</p>\",\"PeriodicalId\":47597,\"journal\":{\"name\":\"Frontiers in Robotics and AI\",\"volume\":\"11 \",\"pages\":\"1426282\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11788533/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Robotics and AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frobt.2024.1426282\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2024.1426282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

摘要

引言:受自然现象的启发,一代又一代的研究人员一直在研究,当单个机器人只能感知并与附近的同伴交流,而没有全球通信和协调的手段时,一群机器人如何能够连贯而有目的地行动。在本文中,我们将证明,当群体自适应承认异构行为角色时,它们可以表现得更好。方法:我们将觅食群体任务建模为一个广泛形式的完全合作博弈,其中群体奖励是个体贡献(收集物品的总和)的加性函数。为了最大化群体奖励,以前的工作提出使用分布式强化学习,其中每个机器人根据有效性指数奖励(EI)适应自己的避碰决策。EI使用它们自己碰撞之间的时间信息(即使是简单的物理机器人也很容易获得这些信息)。虽然EI很有前途,但它的使用是脆弱的(如我们所示),因为自私地寻求优化自己的EI(最小化碰撞时间)的机器人实际上会导致整个群体的性能下降。结果:为了解决这个问题,我们从博弈论的角度推导了一个奖励函数,即群体觅食是一个完全合作的、未知的视界重复博弈。我们分析地证明了群体的总协调开销(用于避免碰撞的总时间,而不是觅食本身)与群体的总效用直接相关:开销越少,收集的物品越多。将每次碰撞视为重复游戏中的一个阶段,开销由所有机器人的总EI限定。然后,我们使用边际贡献(差异奖励)公式从总EI中导出个人奖励。由此产生的对齐有效指数(A E I)奖励具有这样的属性:每个个体都可以估计其决策对群体的影响:个体的改进转化为群体的改进。我们证明了ai可证明地推广了以前的工作,并添加了一个计算反事实机器人缺席影响的组件。对这个反事实的不同假设导致了从上到下的ea界。讨论:虽然理论分析澄清了关于机器人现实的假设和差距,但真实和模拟机器人的实验经验证明了该方法在实践中的有效性,以及行为(决策)多样性在优化群体目标中的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Heterogeneous foraging swarms can be better.

Introduction: Inspired by natural phenomena, generations of researchers have been investigating how a swarm of robots can act coherently and purposefully, when individual robots can only sense and communicate with nearby peers, with no means of global communications and coordination. In this paper, we will show that swarms can perform better, when they self-adapt to admit heterogeneous behavior roles.

Methods: We model a foraging swarm task as an extensive-form fully-cooperative game, in which the swarm reward is an additive function of individual contributions (the sum of collected items). To maximize the swarm reward, previous work proposed using distributed reinforcement learning, where each robot adapts its own collision-avoidance decisions based on the Effectiveness Index reward (EI). EI uses information about the time between their own collisions (information readily available even to simple physical robots). While promising, the use of EI is brittle (as we show), since robots that selfishly seek to optimize their own EI (minimizing time spent on collisions) can actually cause swarm-wide performance to degrade.

Results: To address this, we derive a reward function from a game-theoretic view of swarm foraging as a fully-cooperative, unknown horizon repeating game. We demonstrate analytically that the total coordination overhead of the swarm (total time spent on collision-avoidance, rather than foraging per-se) is directly tied to the total utility of the swarm: less overhead, more items collected. Treating every collision as a stage in the repeating game, the overhead is bounded by the total EI of all robots. We then use a marginal-contribution (difference-reward) formulation to derive individual rewards from the total EI. The resulting Aligned Effective Index ( A E I ) reward has the property that each individual can estimate the impact of its decisions on the swarm: individual improvements translate to swarm improvements. We show that A E I provably generalizes previous work, adding a component that computes the effect of counterfactual robot absence. Different assumptions on this counterfactual lead to bounds on A E I from above and below.

Discussion: While the theoretical analysis clarifies both assumptions and gaps with respect to the reality of robots, experiments with real and simulated robots empirically demonstrate the efficacy of the approach in practice, and the importance of behavioral (decision-making) diversity in optimizing swarm goals.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.50
自引率
5.90%
发文量
355
审稿时长
14 weeks
期刊介绍: Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信