{"title":"Multi-LLM-based augmentation and synthetic data generation of construction schedules and task descriptions with SLM-as-a-judge assessment","authors":"Akarsth Kumar Singh, Shang-Hsien Hsieh","doi":"10.1016/j.aei.2025.103825","DOIUrl":null,"url":null,"abstract":"<div><div>The fragmented structure, semantic inconsistency, and limited availability of construction schedule data significantly hinder the development of intelligent planning tools in the architecture, engineering, and construction (AEC) domain. In particular, the absence of high-quality, hierarchically structured Work Breakdown Structure with Task Dependency (WBS-TD) datasets restricts the training and evaluation of AI-based models for automated construction workflows. This study investigates whether Large Language Models (LLMs) can be systematically applied to enhance and generate construction schedule and task description data, and whether lightweight, locally deployed Small Language Models (SLMs) can effectively evaluate these outputs using domain-specific rubrics in a scalable and privacy-preserving manner. To address this, an integrated methodology is proposed, consisting of three components: (1) Role-Guided Modular Prompt Chaining (RGPC), which transforms inconsistent WBS-TD inputs into logically ordered and semantically enriched outputs; (2) synthetic data generation via a multi-LLM pipeline using structured prompt strategies to produce diverse, realistic construction schedules and descriptions; and (3) SLM-as-a-Judge, a rubric-based evaluation approach that uses a lightweight, locally deployed SLMs to assess output quality across structural, logical, and domain-specific dimensions without requiring sensitive data to leave secure environments. Experimental results show that Claude-3.5-Sonnet achieved 77 % quality in augmented schedule generation, Gemini-2.0-Flash reached 92 % in synthetic schedule generation, and DeepSeek-R1 provided the best balance of quality and diversity in synthetic construction task description generation, demonstrating strong domain alignment across tasks. The framework generates reusable, machine-readable knowledge graph datasets supporting downstream applications such as AI-assisted planning, progress monitoring, and risk analysis. This study delivers a scalable, model-agnostic pipeline that advances automation and evaluation in construction informatics.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"69 ","pages":"Article 103825"},"PeriodicalIF":9.9000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625007189","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The fragmented structure, semantic inconsistency, and limited availability of construction schedule data significantly hinder the development of intelligent planning tools in the architecture, engineering, and construction (AEC) domain. In particular, the absence of high-quality, hierarchically structured Work Breakdown Structure with Task Dependency (WBS-TD) datasets restricts the training and evaluation of AI-based models for automated construction workflows. This study investigates whether Large Language Models (LLMs) can be systematically applied to enhance and generate construction schedule and task description data, and whether lightweight, locally deployed Small Language Models (SLMs) can effectively evaluate these outputs using domain-specific rubrics in a scalable and privacy-preserving manner. To address this, an integrated methodology is proposed, consisting of three components: (1) Role-Guided Modular Prompt Chaining (RGPC), which transforms inconsistent WBS-TD inputs into logically ordered and semantically enriched outputs; (2) synthetic data generation via a multi-LLM pipeline using structured prompt strategies to produce diverse, realistic construction schedules and descriptions; and (3) SLM-as-a-Judge, a rubric-based evaluation approach that uses a lightweight, locally deployed SLMs to assess output quality across structural, logical, and domain-specific dimensions without requiring sensitive data to leave secure environments. Experimental results show that Claude-3.5-Sonnet achieved 77 % quality in augmented schedule generation, Gemini-2.0-Flash reached 92 % in synthetic schedule generation, and DeepSeek-R1 provided the best balance of quality and diversity in synthetic construction task description generation, demonstrating strong domain alignment across tasks. The framework generates reusable, machine-readable knowledge graph datasets supporting downstream applications such as AI-assisted planning, progress monitoring, and risk analysis. This study delivers a scalable, model-agnostic pipeline that advances automation and evaluation in construction informatics.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.