{"title":"A critical assessment of reinforcement learning methods for microswimmer navigation in complex flows","authors":"Selim Mecanna, Aurore Loisy, Christophe Eloy","doi":"10.1140/epje/s10189-025-00522-2","DOIUrl":null,"url":null,"abstract":"<p>Navigating in a fluid flow while being carried by it, using only information accessible from on-board sensors, is a problem commonly faced by small planktonic organisms. It is also directly relevant to autonomous robots deployed in the oceans. In the last ten years, the fluid mechanics community has widely adopted reinforcement learning, often in the form of its simplest implementations, to address this challenge. But it is unclear how good are the strategies learned by these algorithms. In this paper, we perform a quantitative assessment of reinforcement learning methods applied to navigation in partially observable flows. We first introduce a well-posed problem of directional navigation for which a quasi-optimal policy is known analytically. We then report on the poor performance and robustness of commonly used algorithms (Q-Learning, Advantage Actor Critic) in flows regularly encountered in the literature: Taylor-Green vortices, Arnold–Beltrami–Childress flow, and two-dimensional turbulence. We show that they are vastly surpassed by PPO (Proximal Policy Optimization), a more advanced algorithm that has established dominance across a wide range of benchmarks in the reinforcement learning community. In particular, our custom implementation of PPO matches the theoretical quasi-optimal performance in turbulent flow and does so in a robust manner. Reaching this result required the use of several additional techniques, such as vectorized environments and generalized advantage estimation, as well as hyperparameter optimization. This study demonstrates the importance of algorithm selection, implementation details, and fine-tuning for discovering truly smart autonomous navigation strategies in complex flows.</p>","PeriodicalId":790,"journal":{"name":"The European Physical Journal E","volume":"48 10-12","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The European Physical Journal E","FirstCategoryId":"4","ListUrlMain":"https://link.springer.com/article/10.1140/epje/s10189-025-00522-2","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Navigating in a fluid flow while being carried by it, using only information accessible from on-board sensors, is a problem commonly faced by small planktonic organisms. It is also directly relevant to autonomous robots deployed in the oceans. In the last ten years, the fluid mechanics community has widely adopted reinforcement learning, often in the form of its simplest implementations, to address this challenge. But it is unclear how good are the strategies learned by these algorithms. In this paper, we perform a quantitative assessment of reinforcement learning methods applied to navigation in partially observable flows. We first introduce a well-posed problem of directional navigation for which a quasi-optimal policy is known analytically. We then report on the poor performance and robustness of commonly used algorithms (Q-Learning, Advantage Actor Critic) in flows regularly encountered in the literature: Taylor-Green vortices, Arnold–Beltrami–Childress flow, and two-dimensional turbulence. We show that they are vastly surpassed by PPO (Proximal Policy Optimization), a more advanced algorithm that has established dominance across a wide range of benchmarks in the reinforcement learning community. In particular, our custom implementation of PPO matches the theoretical quasi-optimal performance in turbulent flow and does so in a robust manner. Reaching this result required the use of several additional techniques, such as vectorized environments and generalized advantage estimation, as well as hyperparameter optimization. This study demonstrates the importance of algorithm selection, implementation details, and fine-tuning for discovering truly smart autonomous navigation strategies in complex flows.
小型浮游生物通常面临的一个问题是,在流体中航行时,只能利用机载传感器提供的信息。它还与部署在海洋中的自主机器人直接相关。在过去的十年中,流体力学社区已经广泛采用强化学习,通常以其最简单的实现形式来解决这一挑战。但目前尚不清楚这些算法学到的策略有多好。在本文中,我们对应用于部分可观察流导航的强化学习方法进行了定量评估。首先,我们引入了一个准最优策略已知的定向导航的适定问题。然后,我们报告了常用算法(Q-Learning, Advantage Actor Critic)在文献中经常遇到的流中的不良性能和鲁棒性:泰勒-格林涡流,阿诺德-贝尔特拉米-蔡尔德里斯流和二维湍流。我们表明,它们被PPO(近端策略优化)大大超越,PPO是一种更先进的算法,在强化学习社区的广泛基准中建立了主导地位。特别是,我们的自定义实现的PPO匹配理论上的准最佳性能在湍流中,并以鲁棒的方式做到了这一点。达到这个结果需要使用一些额外的技术,例如向量化环境和广义优势估计,以及超参数优化。这项研究证明了算法选择、实现细节和微调对于在复杂流中发现真正智能的自主导航策略的重要性。
期刊介绍:
EPJ E publishes papers describing advances in the understanding of physical aspects of Soft, Liquid and Living Systems.
Soft matter is a generic term for a large group of condensed, often heterogeneous systems -- often also called complex fluids -- that display a large response to weak external perturbations and that possess properties governed by slow internal dynamics.
Flowing matter refers to all systems that can actually flow, from simple to multiphase liquids, from foams to granular matter.
Living matter concerns the new physics that emerges from novel insights into the properties and behaviours of living systems. Furthermore, it aims at developing new concepts and quantitative approaches for the study of biological phenomena. Approaches from soft matter physics and statistical physics play a key role in this research.
The journal includes reports of experimental, computational and theoretical studies and appeals to the broad interdisciplinary communities including physics, chemistry, biology, mathematics and materials science.