From concept to manufacturing: evaluating vision-language models for engineering design

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2025-07-01 DOI:10.1007/s10462-025-11290-y

Cyril Picard, Kristen M. Edwards, Anna C. Doris, Brandon Man, Giorgio Giannone, Md Ferdous Alam, Faez Ahmed

{"title":"From concept to manufacturing: evaluating vision-language models for engineering design","authors":"Cyril Picard, Kristen M. Edwards, Anna C. Doris, Brandon Man, Giorgio Giannone, Md Ferdous Alam, Faez Ahmed","doi":"10.1007/s10462-025-11290-y","DOIUrl":null,"url":null,"abstract":"<div><p>Engineering design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision-language models (VLMs), such as GPT-4V, enabling AI to impact many more types of tasks. Our work presents a comprehensive evaluation of VLMs across a spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Specifically in this paper, we assess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design tasks such as sketch similarity analysis, CAD generation, topology optimization, manufacturability assessment, and engineering textbook problems. Through this structured evaluation, we not only explore VLMs’ proficiency in handling complex design challenges but also identify their limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 9","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11290-y.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11290-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Engineering design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision-language models (VLMs), such as GPT-4V, enabling AI to impact many more types of tasks. Our work presents a comprehensive evaluation of VLMs across a spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Specifically in this paper, we assess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design tasks such as sketch similarity analysis, CAD generation, topology optimization, manufacturability assessment, and engineering textbook problems. Through this structured evaluation, we not only explore VLMs’ proficiency in handling complex design challenges but also identify their limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.

查看原文本刊更多论文

从概念到制造：评估工程设计的视觉语言模型

随着人工智能的出现，工程设计正在经历一场变革，标志着我们如何处理产品、系统和服务规划的新时代。大型语言模型已经展示了支持这种转变的令人印象深刻的能力。然而，由于文本是他们唯一的输入方式，他们无法利用工程师已经使用了几个世纪并且已经习惯的大量视觉工件。随着多模态视觉语言模型（vlm）的发布，如GPT-4V，这一差距得到了解决，使人工智能能够影响更多类型的任务。我们的工作展示了跨工程设计任务范围的vlm的综合评估，分为四个主要领域：概念设计，系统级和详细设计，制造和检查以及工程教育任务。具体而言，在本文中，我们评估了两种VLMs （GPT-4V和LLaVA 1.6 34B）在草图相似性分析、CAD生成、拓扑优化、可制造性评估和工程教科书问题等设计任务中的能力。通过这种结构化的评估，我们不仅探索了VLMs在处理复杂设计挑战方面的熟练程度，而且还确定了它们在复杂工程设计应用中的局限性。我们的研究为未来视觉语言模型的评估奠定了基础。它还提供了一组基准测试数据集，包含1000多个查询，用于该领域的持续进步和应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.