Taegeon Kim , Seokhwan Kim , Wei-Chih Chern , Somin Park , Daeho Kim , Hongjo Kim
{"title":"Optimizing large vision-language models for context-aware construction safety assessment","authors":"Taegeon Kim , Seokhwan Kim , Wei-Chih Chern , Somin Park , Daeho Kim , Hongjo Kim","doi":"10.1016/j.autcon.2025.106510","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a context-aware large vision-language model (LVLM) for automated construction site safety assessment, addressing the limitations of existing models in domain-specific hazard recognition. It introduces a framework that combines domain-specific image-text data generation, vision encoder fine-tuning for improved object recognition, and Low-Rank Adaptation (LoRA)-based model adjustment for context-aware safety reasoning. The model was evaluated on 400 images from 10 hazardous situations, demonstrating superior performance in the image captioning task (average ROUGE-L: 0.3852, SPICE: 0.3615, SBERT-based similarity: 0.7484). For safety assessment, the fine-tuned model achieved 94.25 % accuracy in predicting safety status, significantly outperforming GPT-4 V (53.25 %) and LLaVA 1.5 (48 %). The quality of textual justifications was assessed using both GPT-4 V-based and expert-based evaluations of relevance and preference. In both settings, the fine-tuned model received the highest scores, demonstrating robust and context-aware safety reasoning. These findings confirm that domain-specific fine-tuning enhances safety classification and hazard interpretation, advancing construction site monitoring.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":"180 ","pages":"Article 106510"},"PeriodicalIF":11.5000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580525005503","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a context-aware large vision-language model (LVLM) for automated construction site safety assessment, addressing the limitations of existing models in domain-specific hazard recognition. It introduces a framework that combines domain-specific image-text data generation, vision encoder fine-tuning for improved object recognition, and Low-Rank Adaptation (LoRA)-based model adjustment for context-aware safety reasoning. The model was evaluated on 400 images from 10 hazardous situations, demonstrating superior performance in the image captioning task (average ROUGE-L: 0.3852, SPICE: 0.3615, SBERT-based similarity: 0.7484). For safety assessment, the fine-tuned model achieved 94.25 % accuracy in predicting safety status, significantly outperforming GPT-4 V (53.25 %) and LLaVA 1.5 (48 %). The quality of textual justifications was assessed using both GPT-4 V-based and expert-based evaluations of relevance and preference. In both settings, the fine-tuned model received the highest scores, demonstrating robust and context-aware safety reasoning. These findings confirm that domain-specific fine-tuning enhances safety classification and hazard interpretation, advancing construction site monitoring.
期刊介绍:
Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities.
The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.