{"title":"Controllable diffusion models for hazardous construction site scene generation","authors":"XunLong Wang, JiangTao Ren","doi":"10.1016/j.asoc.2025.113446","DOIUrl":null,"url":null,"abstract":"<div><div>Hazardous scene recognition is critical for construction site safety, but the low occurrence of such scenes in real environments leads to insufficient training data, limiting model development. Hazardous scene generation helps address data scarcity but involves complex background-object relationships and significant size differences, making precise layout control difficult. These characteristics make achieving precise layout control challenging for general-purpose hazardous scene generation models. To address these challenges, we propose a novel construction site hazardous scene generation framework based on large language and diffusion models, consisting of a two-stage generation process. In the first stage, we fine-tune the large language model (LLM) through context learning to serve as a text-based layout generator. In the second stage, we introduce a novel text-to-image diffusion model to guide the image generation process, ensuring that the generated image adheres to the scene layout produced in the first stage. Additionally, We propose two key modules, the Layout Enhancement Module and the Scale Fusion Module, to improve image quality and layout adherence. Comparative experiments show that our method generates superior scenes with stronger controllability and higher quality. Testing on a real-world dataset achieved a mAP score of 38.1%, improving model accuracy by 20.4% AP, 17.5% <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>50</mn></mrow></msub></mrow></math></span>, and 17.0% AR compared to models trained on real data, demonstrating our method’s effectiveness in hazardous construction site scene generation.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"181 ","pages":"Article 113446"},"PeriodicalIF":6.6000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625007574","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hazardous scene recognition is critical for construction site safety, but the low occurrence of such scenes in real environments leads to insufficient training data, limiting model development. Hazardous scene generation helps address data scarcity but involves complex background-object relationships and significant size differences, making precise layout control difficult. These characteristics make achieving precise layout control challenging for general-purpose hazardous scene generation models. To address these challenges, we propose a novel construction site hazardous scene generation framework based on large language and diffusion models, consisting of a two-stage generation process. In the first stage, we fine-tune the large language model (LLM) through context learning to serve as a text-based layout generator. In the second stage, we introduce a novel text-to-image diffusion model to guide the image generation process, ensuring that the generated image adheres to the scene layout produced in the first stage. Additionally, We propose two key modules, the Layout Enhancement Module and the Scale Fusion Module, to improve image quality and layout adherence. Comparative experiments show that our method generates superior scenes with stronger controllability and higher quality. Testing on a real-world dataset achieved a mAP score of 38.1%, improving model accuracy by 20.4% AP, 17.5% , and 17.0% AR compared to models trained on real data, demonstrating our method’s effectiveness in hazardous construction site scene generation.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.