{"title":"基于本体的提示,使用大型语言模型从构造图像中推断构造活动","authors":"Cheng Zeng , Timo Hartmann , Leyuan Ma","doi":"10.1016/j.aei.2025.103869","DOIUrl":null,"url":null,"abstract":"<div><div>Recognizing construction activities from images enhances decision-making by providing context-aware insights into project progress, resource allocation, and productivity. However, conventional approaches, such as supervised learning and knowledge-based approach, struggle to generalize to the dynamic nature of construction sites due to limited annotated data and rigid knowledge patterns. To address these limitations, we propose a novel method that integrates Large Language Models (LLMs) with structured domain knowledge via ontology-based prompting. In our approach, visual features such as entities, spatial arrangements, and actions are mapped to predefined concepts in a construction-specific ontology, resulting in symbolic scene representations. In-context learning is employed by constructing prompts that include multiple structured examples, each describing a scenario with its associated activities. By analyzing these ontology-grounded examples, the LLM learns patterns that connect symbolic representations to construction activity labels, enabling generalization to new, unseen scenes. We evaluated the method using GPT-based models on a dataset covering 29 construction activity types. The model achieved an activity recognition accuracy of 73.68 %, and 50.00 % when jointly identifying the activity and its associated entities. Ablation studies confirmed the positive effects of including Chain-of-Thought reasoning, diverse visual concepts, and richer context examples. These results demonstrate the potential of ontology-informed prompting to support scalable and adaptive visual understanding in construction domains.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"69 ","pages":"Article 103869"},"PeriodicalIF":9.9000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ontology-based prompting with large language models for inferring construction activities from construction images\",\"authors\":\"Cheng Zeng , Timo Hartmann , Leyuan Ma\",\"doi\":\"10.1016/j.aei.2025.103869\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recognizing construction activities from images enhances decision-making by providing context-aware insights into project progress, resource allocation, and productivity. However, conventional approaches, such as supervised learning and knowledge-based approach, struggle to generalize to the dynamic nature of construction sites due to limited annotated data and rigid knowledge patterns. To address these limitations, we propose a novel method that integrates Large Language Models (LLMs) with structured domain knowledge via ontology-based prompting. In our approach, visual features such as entities, spatial arrangements, and actions are mapped to predefined concepts in a construction-specific ontology, resulting in symbolic scene representations. In-context learning is employed by constructing prompts that include multiple structured examples, each describing a scenario with its associated activities. By analyzing these ontology-grounded examples, the LLM learns patterns that connect symbolic representations to construction activity labels, enabling generalization to new, unseen scenes. We evaluated the method using GPT-based models on a dataset covering 29 construction activity types. The model achieved an activity recognition accuracy of 73.68 %, and 50.00 % when jointly identifying the activity and its associated entities. Ablation studies confirmed the positive effects of including Chain-of-Thought reasoning, diverse visual concepts, and richer context examples. These results demonstrate the potential of ontology-informed prompting to support scalable and adaptive visual understanding in construction domains.</div></div>\",\"PeriodicalId\":50941,\"journal\":{\"name\":\"Advanced Engineering Informatics\",\"volume\":\"69 \",\"pages\":\"Article 103869\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advanced Engineering Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1474034625007621\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625007621","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Ontology-based prompting with large language models for inferring construction activities from construction images
Recognizing construction activities from images enhances decision-making by providing context-aware insights into project progress, resource allocation, and productivity. However, conventional approaches, such as supervised learning and knowledge-based approach, struggle to generalize to the dynamic nature of construction sites due to limited annotated data and rigid knowledge patterns. To address these limitations, we propose a novel method that integrates Large Language Models (LLMs) with structured domain knowledge via ontology-based prompting. In our approach, visual features such as entities, spatial arrangements, and actions are mapped to predefined concepts in a construction-specific ontology, resulting in symbolic scene representations. In-context learning is employed by constructing prompts that include multiple structured examples, each describing a scenario with its associated activities. By analyzing these ontology-grounded examples, the LLM learns patterns that connect symbolic representations to construction activity labels, enabling generalization to new, unseen scenes. We evaluated the method using GPT-based models on a dataset covering 29 construction activity types. The model achieved an activity recognition accuracy of 73.68 %, and 50.00 % when jointly identifying the activity and its associated entities. Ablation studies confirmed the positive effects of including Chain-of-Thought reasoning, diverse visual concepts, and richer context examples. These results demonstrate the potential of ontology-informed prompting to support scalable and adaptive visual understanding in construction domains.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.