Ontology-based prompting with large language models for inferring construction activities from construction images

IF 9.9 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Advanced Engineering Informatics Pub Date : 2025-09-11 DOI:10.1016/j.aei.2025.103869

Cheng Zeng , Timo Hartmann , Leyuan Ma

{"title":"Ontology-based prompting with large language models for inferring construction activities from construction images","authors":"Cheng Zeng , Timo Hartmann , Leyuan Ma","doi":"10.1016/j.aei.2025.103869","DOIUrl":null,"url":null,"abstract":"<div><div>Recognizing construction activities from images enhances decision-making by providing context-aware insights into project progress, resource allocation, and productivity. However, conventional approaches, such as supervised learning and knowledge-based approach, struggle to generalize to the dynamic nature of construction sites due to limited annotated data and rigid knowledge patterns. To address these limitations, we propose a novel method that integrates Large Language Models (LLMs) with structured domain knowledge via ontology-based prompting. In our approach, visual features such as entities, spatial arrangements, and actions are mapped to predefined concepts in a construction-specific ontology, resulting in symbolic scene representations. In-context learning is employed by constructing prompts that include multiple structured examples, each describing a scenario with its associated activities. By analyzing these ontology-grounded examples, the LLM learns patterns that connect symbolic representations to construction activity labels, enabling generalization to new, unseen scenes. We evaluated the method using GPT-based models on a dataset covering 29 construction activity types. The model achieved an activity recognition accuracy of 73.68 %, and 50.00 % when jointly identifying the activity and its associated entities. Ablation studies confirmed the positive effects of including Chain-of-Thought reasoning, diverse visual concepts, and richer context examples. These results demonstrate the potential of ontology-informed prompting to support scalable and adaptive visual understanding in construction domains.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"69 ","pages":"Article 103869"},"PeriodicalIF":9.9000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625007621","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recognizing construction activities from images enhances decision-making by providing context-aware insights into project progress, resource allocation, and productivity. However, conventional approaches, such as supervised learning and knowledge-based approach, struggle to generalize to the dynamic nature of construction sites due to limited annotated data and rigid knowledge patterns. To address these limitations, we propose a novel method that integrates Large Language Models (LLMs) with structured domain knowledge via ontology-based prompting. In our approach, visual features such as entities, spatial arrangements, and actions are mapped to predefined concepts in a construction-specific ontology, resulting in symbolic scene representations. In-context learning is employed by constructing prompts that include multiple structured examples, each describing a scenario with its associated activities. By analyzing these ontology-grounded examples, the LLM learns patterns that connect symbolic representations to construction activity labels, enabling generalization to new, unseen scenes. We evaluated the method using GPT-based models on a dataset covering 29 construction activity types. The model achieved an activity recognition accuracy of 73.68 %, and 50.00 % when jointly identifying the activity and its associated entities. Ablation studies confirmed the positive effects of including Chain-of-Thought reasoning, diverse visual concepts, and richer context examples. These results demonstrate the potential of ontology-informed prompting to support scalable and adaptive visual understanding in construction domains.

查看原文本刊更多论文

基于本体的提示，使用大型语言模型从构造图像中推断构造活动

通过提供对项目进度、资源分配和生产力的上下文感知的洞察，从图像中识别施工活动可以增强决策。然而，传统的方法，如监督学习和基于知识的方法，由于有限的注释数据和僵化的知识模式，难以推广到建筑工地的动态性质。为了解决这些限制，我们提出了一种新的方法，通过基于本体的提示将大型语言模型（llm）与结构化领域知识集成在一起。在我们的方法中，实体、空间安排和动作等视觉特征被映射到特定于构造的本体中的预定义概念，从而产生符号场景表示。上下文学习是通过构建包含多个结构化示例的提示来实现的，每个示例描述一个场景及其相关活动。通过分析这些以本体为基础的例子，LLM学习将符号表示与建筑活动标签联系起来的模式，从而实现对新的、看不见的场景的概括。我们在包含29种建筑活动类型的数据集上使用基于gpt的模型评估了该方法。该模型对活动的识别准确率为73.68%，对活动及其关联实体的识别准确率为50.00%。消融研究证实了思维链推理、多样化的视觉概念和更丰富的上下文示例的积极作用。这些结果证明了本体通知提示在建筑领域支持可扩展和自适应视觉理解的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advanced Engineering Informatics 工程技术-工程：综合

CiteScore

12.40

自引率

18.20%

发文量

292

审稿时长

45 days

期刊介绍： Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.