Long-Term Human Trajectory Prediction Using 3D Dynamic Scene Graphs

IF 5.3 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2024-10-16 DOI:10.1109/LRA.2024.3482169

Nicolas Gorlo;Lukas Schmid;Luca Carlone

{"title":"Long-Term Human Trajectory Prediction Using 3D Dynamic Scene Graphs","authors":"Nicolas Gorlo;Lukas Schmid;Luca Carlone","doi":"10.1109/LRA.2024.3482169","DOIUrl":null,"url":null,"abstract":"We present a novel approach for long-term human trajectory prediction in indoor human-centric environments, which is essential for long-horizon robot planning in these environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to \n<inline-formula><tex-math>$\\mathrm{60}$</tex-math></inline-formula>\n<inline-formula><tex-math>$\\mathrm{s}$</tex-math></inline-formula>\n. We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged (i.e., evaluated in a zero-shot fashion on the dataset) baselines for a time horizon of \n<inline-formula><tex-math>$\\mathrm{60}$</tex-math></inline-formula>\n<inline-formula><tex-math>$\\mathrm{s}$</tex-math></inline-formula>\n.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"10978-10985"},"PeriodicalIF":5.3000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10720207/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

We present a novel approach for long-term human trajectory prediction in indoor human-centric environments, which is essential for long-horizon robot planning in these environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to

$\mathrm{60}$

$\mathrm{s}$

. We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged (i.e., evaluated in a zero-shot fashion on the dataset) baselines for a time horizon of

$\mathrm{60}$

$\mathrm{s}$

查看原文本刊更多论文

利用三维动态场景图进行长期人体轨迹预测

我们提出了一种在室内以人为中心的环境中进行长期人类轨迹预测的新方法，这对于机器人在这些环境中的长距离规划至关重要。最先进的人类轨迹预测方法局限于避免碰撞和短期规划，无法模拟人类与环境的复杂互动。相比之下，我们的方法通过预测人类与环境的交互序列，并利用这些信息指导长达 $\mathrm{60}$$\mathrm{s}$ 的轨迹预测，从而克服了这些局限性。我们利用大型语言模型（LLM）来预测人与环境的互动，并以丰富的场景上下文信息作为 LLM 预测的条件。这些信息是以三维动态场景图的形式给出的，它将环境的几何形状、语义和可穿越性编码为一个分层表示。然后，我们使用基于连续时间马尔可夫链的概率方法，将这些交互序列转化为人类位置的多模态时空分布。为了评估我们的方法，我们引入了一个新的半合成数据集，该数据集包含复杂室内环境中人的长期轨迹，其中还包括人与物体交互的注释。我们在全面的实验评估中表明，与最佳非特权（即在数据集上以零镜头方式进行评估）基线相比，我们的方法在$\mathrm{60}$$\mathrm{s}$的时间跨度内实现了54%的平均负对数似然降低和26.5%的Best-of-20位移误差降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.