Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig
{"title":"Agent Workflow Memory","authors":"Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig","doi":"arxiv-2409.07429","DOIUrl":null,"url":null,"abstract":"Despite the potential of language model-based agents to solve real-world\ntasks such as web navigation, current methods still struggle with long-horizon\ntasks with complex action trajectories. In contrast, humans can flexibly solve\ncomplex tasks by learning reusable task workflows from past experiences and\nusing them to guide future actions. To build agents that can similarly benefit\nfrom this process, we introduce Agent Workflow Memory (AWM), a method for\ninducing commonly reused routines, i.e., workflows, and selectively providing\nworkflows to the agent to guide subsequent generations. AWM flexibly applies to\nboth offline and online scenarios, where agents induce workflows from training\nexamples beforehand or from test queries on the fly. We experiment on two major\nweb navigation benchmarks -- Mind2Web and WebArena -- that collectively cover\n1000+ tasks from 200+ domains across travel, shopping, and social media, among\nothers. AWM substantially improves the baseline results by 24.6% and 51.1%\nrelative success rate on Mind2Web and WebArena while reducing the number of\nsteps taken to solve WebArena tasks successfully. Furthermore, online AWM\nrobustly generalizes in cross-task, website, and domain evaluations, surpassing\nbaselines from 8.9 to 14.0 absolute points as train-test task distribution gaps\nwiden.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Despite the potential of language model-based agents to solve real-world
tasks such as web navigation, current methods still struggle with long-horizon
tasks with complex action trajectories. In contrast, humans can flexibly solve
complex tasks by learning reusable task workflows from past experiences and
using them to guide future actions. To build agents that can similarly benefit
from this process, we introduce Agent Workflow Memory (AWM), a method for
inducing commonly reused routines, i.e., workflows, and selectively providing
workflows to the agent to guide subsequent generations. AWM flexibly applies to
both offline and online scenarios, where agents induce workflows from training
examples beforehand or from test queries on the fly. We experiment on two major
web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover
1000+ tasks from 200+ domains across travel, shopping, and social media, among
others. AWM substantially improves the baseline results by 24.6% and 51.1%
relative success rate on Mind2Web and WebArena while reducing the number of
steps taken to solve WebArena tasks successfully. Furthermore, online AWM
robustly generalizes in cross-task, website, and domain evaluations, surpassing
baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps
widen.