Multi-LLM-based augmentation and synthetic data generation of construction schedules and task descriptions with SLM-as-a-judge assessment

IF 9.9 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Advanced Engineering Informatics Pub Date : 2025-09-09 DOI:10.1016/j.aei.2025.103825

Akarsth Kumar Singh, Shang-Hsien Hsieh

{"title":"Multi-LLM-based augmentation and synthetic data generation of construction schedules and task descriptions with SLM-as-a-judge assessment","authors":"Akarsth Kumar Singh, Shang-Hsien Hsieh","doi":"10.1016/j.aei.2025.103825","DOIUrl":null,"url":null,"abstract":"<div><div>The fragmented structure, semantic inconsistency, and limited availability of construction schedule data significantly hinder the development of intelligent planning tools in the architecture, engineering, and construction (AEC) domain. In particular, the absence of high-quality, hierarchically structured Work Breakdown Structure with Task Dependency (WBS-TD) datasets restricts the training and evaluation of AI-based models for automated construction workflows. This study investigates whether Large Language Models (LLMs) can be systematically applied to enhance and generate construction schedule and task description data, and whether lightweight, locally deployed Small Language Models (SLMs) can effectively evaluate these outputs using domain-specific rubrics in a scalable and privacy-preserving manner. To address this, an integrated methodology is proposed, consisting of three components: (1) Role-Guided Modular Prompt Chaining (RGPC), which transforms inconsistent WBS-TD inputs into logically ordered and semantically enriched outputs; (2) synthetic data generation via a multi-LLM pipeline using structured prompt strategies to produce diverse, realistic construction schedules and descriptions; and (3) SLM-as-a-Judge, a rubric-based evaluation approach that uses a lightweight, locally deployed SLMs to assess output quality across structural, logical, and domain-specific dimensions without requiring sensitive data to leave secure environments. Experimental results show that Claude-3.5-Sonnet achieved 77 % quality in augmented schedule generation, Gemini-2.0-Flash reached 92 % in synthetic schedule generation, and DeepSeek-R1 provided the best balance of quality and diversity in synthetic construction task description generation, demonstrating strong domain alignment across tasks. The framework generates reusable, machine-readable knowledge graph datasets supporting downstream applications such as AI-assisted planning, progress monitoring, and risk analysis. This study delivers a scalable, model-agnostic pipeline that advances automation and evaluation in construction informatics.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"69 ","pages":"Article 103825"},"PeriodicalIF":9.9000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625007189","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The fragmented structure, semantic inconsistency, and limited availability of construction schedule data significantly hinder the development of intelligent planning tools in the architecture, engineering, and construction (AEC) domain. In particular, the absence of high-quality, hierarchically structured Work Breakdown Structure with Task Dependency (WBS-TD) datasets restricts the training and evaluation of AI-based models for automated construction workflows. This study investigates whether Large Language Models (LLMs) can be systematically applied to enhance and generate construction schedule and task description data, and whether lightweight, locally deployed Small Language Models (SLMs) can effectively evaluate these outputs using domain-specific rubrics in a scalable and privacy-preserving manner. To address this, an integrated methodology is proposed, consisting of three components: (1) Role-Guided Modular Prompt Chaining (RGPC), which transforms inconsistent WBS-TD inputs into logically ordered and semantically enriched outputs; (2) synthetic data generation via a multi-LLM pipeline using structured prompt strategies to produce diverse, realistic construction schedules and descriptions; and (3) SLM-as-a-Judge, a rubric-based evaluation approach that uses a lightweight, locally deployed SLMs to assess output quality across structural, logical, and domain-specific dimensions without requiring sensitive data to leave secure environments. Experimental results show that Claude-3.5-Sonnet achieved 77 % quality in augmented schedule generation, Gemini-2.0-Flash reached 92 % in synthetic schedule generation, and DeepSeek-R1 provided the best balance of quality and diversity in synthetic construction task description generation, demonstrating strong domain alignment across tasks. The framework generates reusable, machine-readable knowledge graph datasets supporting downstream applications such as AI-assisted planning, progress monitoring, and risk analysis. This study delivers a scalable, model-agnostic pipeline that advances automation and evaluation in construction informatics.

查看原文本刊更多论文

基于多llm的基于slm作为评判评估的施工进度和任务描述的增强和综合数据生成

支离破碎的结构、语义不一致以及施工进度数据的有限可用性极大地阻碍了体系结构、工程和施工（AEC）领域中智能规划工具的发展。特别是，缺乏高质量、分层结构的任务依赖工作分解结构（WBS-TD）数据集，限制了基于人工智能的自动化施工工作流模型的训练和评估。本研究探讨了大型语言模型（llm）是否可以系统地应用于增强和生成施工进度和任务描述数据，以及轻量级的、本地部署的小型语言模型（slm）是否可以以可扩展和隐私保护的方式使用领域特定的规则有效地评估这些输出。为了解决这个问题，提出了一种集成的方法，由三个部分组成：(1)角色引导的模块化提示链接（RGPC），它将不一致的WBS-TD输入转换为逻辑有序和语义丰富的输出；(2)通过多llm管道生成综合数据，使用结构化提示策略生成多样化、现实的施工时间表和描述；(3) SLM-as-a-Judge，这是一种基于规则的评估方法，使用轻量级的、本地部署的slm来评估跨结构、逻辑和领域特定维度的输出质量，而不需要敏感数据离开安全环境。实验结果表明，Claude-3.5-Sonnet在扩增进度生成中达到77%的质量，Gemini-2.0-Flash在合成进度生成中达到92%的质量，DeepSeek-R1在合成构建任务描述生成中提供了质量和多样性的最佳平衡，在任务之间表现出较强的领域一致性。该框架生成可重用的、机器可读的知识图数据集，支持下游应用，如人工智能辅助规划、进度监控和风险分析。这项研究提供了一个可扩展的、模型不可知的管道，推进了建筑信息学的自动化和评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advanced Engineering Informatics 工程技术-工程：综合

CiteScore

12.40

自引率

18.20%

发文量

292

审稿时长

45 days

期刊介绍： Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.