Robots reading recipes: large language models as translators between humans and machines

IF 0.8 Q4 ROBOTICS

Artificial Life and Robotics Pub Date : 2025-06-13 DOI:10.1007/s10015-025-01031-3

Oliver Wang, Grant Cheng, Luc Caspar, Akira Yokota, Mahdi Khosravy, Olaf Witkowski

{"title":"Robots reading recipes: large language models as translators between humans and machines","authors":"Oliver Wang, Grant Cheng, Luc Caspar, Akira Yokota, Mahdi Khosravy, Olaf Witkowski","doi":"10.1007/s10015-025-01031-3","DOIUrl":null,"url":null,"abstract":"<div><p>Large Language Models (LLMs) are a type of machine learning model trained on vast amounts of natural language that have demonstrated novel capabilities in tasks such as text prediction and generation. These tasks allow LLMs to be remarkably suited for understanding the semantics of natural language, which in turn enables applications such as planning real world tasks, writing code for computers, and translating between human languages. Even though LLMs could provide more flexibility in interpreting user requests and have shown to possess some commonsense knowledge, their capabilities for translating natural language instructions into code to control robot actions is only starting to be explored. More specifically, in this paper we are interested in the control of robots tasked with preparing cocktails. Within this context, it is assumed that the LLM has access to a repository of well-formatted recipes. This means that each recipe is written according to the following layout: a list of ingredients, then a subsequent description of how to prepare and mix the various items. Moreover, a set of low-level modules responsible for robot manipulation and vision-related tasks is also provided to the LLM in the shape of an application programming interface (API). Consequently, the main focus of the LLM is on generating a sequence of calls to the API, along with the right parameters, to produce the cocktail requested by users in natural language. Here, we show that it is feasible for LLMs to perform this type of translation on a small number of custom modules, and that certain techniques provide a measurable benefit to the accuracy and consistency of this task without fine-tuning. We found in particular that the use of an ensemble-voting strategy, where multiple trials are repeated and the most common answer is selected, increases accuracy to a certain extent. In addition, there is moderate support for the use of natural language parsing to adjust the prompt of the LLM prior to translation. Lastly, building on previous knowledge we also provide a set of guidelines to help design prompts to improve the accuracy of the resulting sequence of actions. In general, these results suggest that while LLMs can be used as translators of robot instructions, they are best applied in conjunction with these other strategies. The impact of these findings could influence future robotics development, as it provides directions for implementing LLMs more effectively and broadening the accessibility of robotic control to users without an extensive software background.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":"30 3","pages":"407 - 416"},"PeriodicalIF":0.8000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10015-025-01031-3.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-025-01031-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Models (LLMs) are a type of machine learning model trained on vast amounts of natural language that have demonstrated novel capabilities in tasks such as text prediction and generation. These tasks allow LLMs to be remarkably suited for understanding the semantics of natural language, which in turn enables applications such as planning real world tasks, writing code for computers, and translating between human languages. Even though LLMs could provide more flexibility in interpreting user requests and have shown to possess some commonsense knowledge, their capabilities for translating natural language instructions into code to control robot actions is only starting to be explored. More specifically, in this paper we are interested in the control of robots tasked with preparing cocktails. Within this context, it is assumed that the LLM has access to a repository of well-formatted recipes. This means that each recipe is written according to the following layout: a list of ingredients, then a subsequent description of how to prepare and mix the various items. Moreover, a set of low-level modules responsible for robot manipulation and vision-related tasks is also provided to the LLM in the shape of an application programming interface (API). Consequently, the main focus of the LLM is on generating a sequence of calls to the API, along with the right parameters, to produce the cocktail requested by users in natural language. Here, we show that it is feasible for LLMs to perform this type of translation on a small number of custom modules, and that certain techniques provide a measurable benefit to the accuracy and consistency of this task without fine-tuning. We found in particular that the use of an ensemble-voting strategy, where multiple trials are repeated and the most common answer is selected, increases accuracy to a certain extent. In addition, there is moderate support for the use of natural language parsing to adjust the prompt of the LLM prior to translation. Lastly, building on previous knowledge we also provide a set of guidelines to help design prompts to improve the accuracy of the resulting sequence of actions. In general, these results suggest that while LLMs can be used as translators of robot instructions, they are best applied in conjunction with these other strategies. The impact of these findings could influence future robotics development, as it provides directions for implementing LLMs more effectively and broadening the accessibility of robotic control to users without an extensive software background.

查看原文本刊更多论文

阅读食谱的机器人：作为人类和机器之间翻译的大型语言模型

大型语言模型（llm）是一种基于大量自然语言训练的机器学习模型，在文本预测和生成等任务中展示了新颖的功能。这些任务使得llm非常适合理解自然语言的语义，这反过来又使诸如规划现实世界的任务、为计算机编写代码以及在人类语言之间进行翻译等应用程序成为可能。尽管llm可以在解释用户请求方面提供更大的灵活性，并且已经显示出拥有一些常识性知识，但它们将自然语言指令翻译成代码以控制机器人动作的能力才刚刚开始被探索。更具体地说，在本文中，我们对负责调制鸡尾酒的机器人的控制感兴趣。在此上下文中，假定LLM可以访问格式良好的食谱存储库。这意味着每个食谱都是按照以下布局编写的：一个配料列表，然后是如何准备和混合各种物品的后续描述。此外，还以应用程序编程接口（API）的形式向LLM提供了一组负责机器人操作和视觉相关任务的底层模块。因此，LLM的主要重点是生成一系列对API的调用，以及正确的参数，以生成用户用自然语言请求的鸡尾酒。在这里，我们展示了llm在少量自定义模块上执行这种类型的翻译是可行的，并且某些技术为该任务的准确性和一致性提供了可衡量的好处，而无需微调。我们特别发现，使用集合投票策略，即重复多次试验并选择最常见的答案，在一定程度上提高了准确性。此外，还适度支持使用自然语言解析来调整LLM在翻译前的提示。最后，基于之前的知识，我们还提供了一组指导方针来帮助设计提示，以提高结果操作序列的准确性。总的来说，这些结果表明，虽然llm可以用作机器人指令的翻译，但它们最好与这些其他策略结合使用。这些发现的影响可能会影响未来机器人技术的发展，因为它为更有效地实施llm提供了方向，并为没有广泛软件背景的用户扩大了机器人控制的可访问性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Life and Robotics ROBOTICS-

CiteScore

2.00

自引率

22.20%

发文量

101

期刊介绍： Artificial Life and Robotics is an international journal publishing original technical papers and authoritative state-of-the-art reviews on the development of new technologies concerning artificial life and robotics, especially computer-based simulation and hardware for the twenty-first century. This journal covers a broad multidisciplinary field, including areas such as artificial brain research, artificial intelligence, artificial life, artificial living, artificial mind research, brain science, chaos, cognitive science, complexity, computer graphics, evolutionary computations, fuzzy control, genetic algorithms, innovative computations, intelligent control and modelling, micromachines, micro-robot world cup soccer tournament, mobile vehicles, neural networks, neurocomputers, neurocomputing technologies and applications, robotics, robus virtual engineering, and virtual reality. Hardware-oriented submissions are particularly welcome. Publishing body: International Symposium on Artificial Life and RoboticsEditor-in-Chiei: Hiroshi Tanaka Hatanaka R Apartment 101, Hatanaka 8-7A, Ooaza-Hatanaka, Oita city, Oita, Japan 870-0856 ©International Symposium on Artificial Life and Robotics