Andrei Cosmin Redis, Mohammadreza Fani Sani, Bahram Zarrin, Andrea Burattin
{"title":"ProcessTBench: An LLM Plan Generation Dataset for Process Mining","authors":"Andrei Cosmin Redis, Mohammadreza Fani Sani, Bahram Zarrin, Andrea Burattin","doi":"arxiv-2409.09191","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) have shown significant promise in plan\ngeneration. Yet, existing datasets often lack the complexity needed for\nadvanced tool use scenarios - such as handling paraphrased query statements,\nsupporting multiple languages, and managing actions that can be done in\nparallel. These scenarios are crucial for evaluating the evolving capabilities\nof LLMs in real-world applications. Moreover, current datasets don't enable the\nstudy of LLMs from a process perspective, particularly in scenarios where\nunderstanding typical behaviors and challenges in executing the same process\nunder different conditions or formulations is crucial. To address these gaps,\nwe present the ProcessTBench dataset, an extension of the TaskBench dataset\nspecifically designed to evaluate LLMs within a process mining framework.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large Language Models (LLMs) have shown significant promise in plan
generation. Yet, existing datasets often lack the complexity needed for
advanced tool use scenarios - such as handling paraphrased query statements,
supporting multiple languages, and managing actions that can be done in
parallel. These scenarios are crucial for evaluating the evolving capabilities
of LLMs in real-world applications. Moreover, current datasets don't enable the
study of LLMs from a process perspective, particularly in scenarios where
understanding typical behaviors and challenges in executing the same process
under different conditions or formulations is crucial. To address these gaps,
we present the ProcessTBench dataset, an extension of the TaskBench dataset
specifically designed to evaluate LLMs within a process mining framework.