Semantic and lexical analysis of pre-trained vision language artificial intelligence models for automated image descriptions in civil engineering

Pedram Bazrafshan, Kris Melag, Arvin Ebrahimkhanlou
{"title":"Semantic and lexical analysis of pre-trained vision language artificial intelligence models for automated image descriptions in civil engineering","authors":"Pedram Bazrafshan,&nbsp;Kris Melag,&nbsp;Arvin Ebrahimkhanlou","doi":"10.1007/s43503-025-00063-9","DOIUrl":null,"url":null,"abstract":"<div><p>This paper investigates the application of pre-trained Vision-Language Models (VLMs) for describing images from civil engineering materials and construction sites, with a focus on construction components, structural elements, and materials. The novelty of this study lies in the investigation of VLMs for this specialized domain, which has not been previously addressed. As a case study, the paper evaluates ChatGPT-4v’s ability to serve as a descriptor tool by comparing its performance with three human descriptions (a civil engineer and two engineering interns). The contributions of this work include adapting a pre-trained VLM to civil engineering applications without additional fine-tuning and benchmarking its performance using both semantic similarity analysis (SentenceTransformers) and lexical similarity methods. Utilizing two datasets—one from a publicly available online repository and another manually collected by the authors—the study employs whole-text and sentence pair-wise similarity analyses to assess the model’s alignment with human descriptions. Results demonstrate that the best-performing model achieved an average similarity of 76% (4% standard deviation) when compared to human-generated descriptions. The analysis also reveals better performance on the publicly available dataset.</p></div>","PeriodicalId":72138,"journal":{"name":"AI in civil engineering","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43503-025-00063-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI in civil engineering","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43503-025-00063-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper investigates the application of pre-trained Vision-Language Models (VLMs) for describing images from civil engineering materials and construction sites, with a focus on construction components, structural elements, and materials. The novelty of this study lies in the investigation of VLMs for this specialized domain, which has not been previously addressed. As a case study, the paper evaluates ChatGPT-4v’s ability to serve as a descriptor tool by comparing its performance with three human descriptions (a civil engineer and two engineering interns). The contributions of this work include adapting a pre-trained VLM to civil engineering applications without additional fine-tuning and benchmarking its performance using both semantic similarity analysis (SentenceTransformers) and lexical similarity methods. Utilizing two datasets—one from a publicly available online repository and another manually collected by the authors—the study employs whole-text and sentence pair-wise similarity analyses to assess the model’s alignment with human descriptions. Results demonstrate that the best-performing model achieved an average similarity of 76% (4% standard deviation) when compared to human-generated descriptions. The analysis also reveals better performance on the publicly available dataset.

土木工程中用于自动图像描述的预训练视觉语言人工智能模型的语义和词汇分析
本文研究了预训练视觉语言模型(VLMs)在描述土木工程材料和建筑工地图像中的应用,重点关注建筑部件、结构元件和材料。本研究的新颖之处在于对这一专门领域的vlm进行了研究,这是以前没有解决的问题。作为一个案例研究,本文通过将ChatGPT-4v的性能与三个人类描述(一个土木工程师和两个工程实习生)进行比较,评估了ChatGPT-4v作为描述工具的能力。这项工作的贡献包括使预训练的VLM适应土木工程应用,而无需额外的微调,并使用语义相似度分析(SentenceTransformers)和词汇相似度方法对其性能进行基准测试。利用两个数据集——一个来自公开可用的在线存储库,另一个由作者手动收集——研究采用全文和句子对相似度分析来评估模型与人类描述的一致性。结果表明,与人类生成的描述相比,表现最好的模型实现了76%(4%标准差)的平均相似度。分析还揭示了在公开可用的数据集上有更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信