土木工程中用于自动图像描述的预训练视觉语言人工智能模型的语义和词汇分析

AI in civil engineering Pub Date : 2025-08-01 DOI:10.1007/s43503-025-00063-9

Pedram Bazrafshan, Kris Melag, Arvin Ebrahimkhanlou

{"title":"土木工程中用于自动图像描述的预训练视觉语言人工智能模型的语义和词汇分析","authors":"Pedram Bazrafshan, Kris Melag, Arvin Ebrahimkhanlou","doi":"10.1007/s43503-025-00063-9","DOIUrl":null,"url":null,"abstract":"<div><p>This paper investigates the application of pre-trained Vision-Language Models (VLMs) for describing images from civil engineering materials and construction sites, with a focus on construction components, structural elements, and materials. The novelty of this study lies in the investigation of VLMs for this specialized domain, which has not been previously addressed. As a case study, the paper evaluates ChatGPT-4v’s ability to serve as a descriptor tool by comparing its performance with three human descriptions (a civil engineer and two engineering interns). The contributions of this work include adapting a pre-trained VLM to civil engineering applications without additional fine-tuning and benchmarking its performance using both semantic similarity analysis (SentenceTransformers) and lexical similarity methods. Utilizing two datasets—one from a publicly available online repository and another manually collected by the authors—the study employs whole-text and sentence pair-wise similarity analyses to assess the model’s alignment with human descriptions. Results demonstrate that the best-performing model achieved an average similarity of 76% (4% standard deviation) when compared to human-generated descriptions. The analysis also reveals better performance on the publicly available dataset.</p></div>","PeriodicalId":72138,"journal":{"name":"AI in civil engineering","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43503-025-00063-9.pdf","citationCount":"0","resultStr":"{\"title\":\"Semantic and lexical analysis of pre-trained vision language artificial intelligence models for automated image descriptions in civil engineering\",\"authors\":\"Pedram Bazrafshan, Kris Melag, Arvin Ebrahimkhanlou\",\"doi\":\"10.1007/s43503-025-00063-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper investigates the application of pre-trained Vision-Language Models (VLMs) for describing images from civil engineering materials and construction sites, with a focus on construction components, structural elements, and materials. The novelty of this study lies in the investigation of VLMs for this specialized domain, which has not been previously addressed. As a case study, the paper evaluates ChatGPT-4v’s ability to serve as a descriptor tool by comparing its performance with three human descriptions (a civil engineer and two engineering interns). The contributions of this work include adapting a pre-trained VLM to civil engineering applications without additional fine-tuning and benchmarking its performance using both semantic similarity analysis (SentenceTransformers) and lexical similarity methods. Utilizing two datasets—one from a publicly available online repository and another manually collected by the authors—the study employs whole-text and sentence pair-wise similarity analyses to assess the model’s alignment with human descriptions. Results demonstrate that the best-performing model achieved an average similarity of 76% (4% standard deviation) when compared to human-generated descriptions. The analysis also reveals better performance on the publicly available dataset.</p></div>\",\"PeriodicalId\":72138,\"journal\":{\"name\":\"AI in civil engineering\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s43503-025-00063-9.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI in civil engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s43503-025-00063-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI in civil engineering","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43503-025-00063-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了预训练视觉语言模型（VLMs）在描述土木工程材料和建筑工地图像中的应用，重点关注建筑部件、结构元件和材料。本研究的新颖之处在于对这一专门领域的vlm进行了研究，这是以前没有解决的问题。作为一个案例研究，本文通过将ChatGPT-4v的性能与三个人类描述（一个土木工程师和两个工程实习生）进行比较，评估了ChatGPT-4v作为描述工具的能力。这项工作的贡献包括使预训练的VLM适应土木工程应用，而无需额外的微调，并使用语义相似度分析（SentenceTransformers）和词汇相似度方法对其性能进行基准测试。利用两个数据集——一个来自公开可用的在线存储库，另一个由作者手动收集——研究采用全文和句子对相似度分析来评估模型与人类描述的一致性。结果表明，与人类生成的描述相比，表现最好的模型实现了76%（4%标准差）的平均相似度。分析还揭示了在公开可用的数据集上有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic and lexical analysis of pre-trained vision language artificial intelligence models for automated image descriptions in civil engineering

This paper investigates the application of pre-trained Vision-Language Models (VLMs) for describing images from civil engineering materials and construction sites, with a focus on construction components, structural elements, and materials. The novelty of this study lies in the investigation of VLMs for this specialized domain, which has not been previously addressed. As a case study, the paper evaluates ChatGPT-4v’s ability to serve as a descriptor tool by comparing its performance with three human descriptions (a civil engineer and two engineering interns). The contributions of this work include adapting a pre-trained VLM to civil engineering applications without additional fine-tuning and benchmarking its performance using both semantic similarity analysis (SentenceTransformers) and lexical similarity methods. Utilizing two datasets—one from a publicly available online repository and another manually collected by the authors—the study employs whole-text and sentence pair-wise similarity analyses to assess the model’s alignment with human descriptions. Results demonstrate that the best-performing model achieved an average similarity of 76% (4% standard deviation) when compared to human-generated descriptions. The analysis also reveals better performance on the publicly available dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AI in civil engineering

自引率

0.00%

发文量