What’s in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models Through User-Provided Names in Computer Aided Design Files

IF 2.6 3区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Computing and Information Science in Engineering Pub Date : 2023-06-23 DOI:10.1115/1.4062454

Peter Meltzer, Joseph Lambourne, Daniele Grandi

{"title":"What’s in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models Through User-Provided Names in Computer Aided Design Files","authors":"Peter Meltzer, Joseph Lambourne, Daniele Grandi","doi":"10.1115/1.4062454","DOIUrl":null,"url":null,"abstract":"Abstract Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work, we propose that the natural language names designers use in computer aided design (CAD) software are a valuable source of such knowledge, and that large language models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular, we extract and clean a large corpus of natural language part, feature, and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.","PeriodicalId":54856,"journal":{"name":"Journal of Computing and Information Science in Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computing and Information Science in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1115/1.4062454","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work, we propose that the natural language names designers use in computer aided design (CAD) software are a valuable source of such knowledge, and that large language models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular, we extract and clean a large corpus of natural language part, feature, and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.

查看原文本刊更多论文

名字里有什么?通过计算机辅助设计文件中用户提供的名称评估语言模型中的装配件语义知识

摘要装配体中部分-部分和部分-整体关系的语义知识对于从搜索设计库到构建工程知识库的各种任务都是有用的。在这项工作中，我们提出设计人员在计算机辅助设计(CAD)软件中使用的自然语言名称是这些知识的宝贵来源，并且大型语言模型(llm)包含有用的领域特定信息，用于处理这些数据以及其他CAD和工程相关任务。特别是，我们提取并清理了大量自然语言部分、特征和文档名称的语料库，并使用它来定量地证明，预先训练的语言模型可以在三个自监督任务上优于许多基准测试，而之前从未见过这些数据。此外，我们表明对文本数据语料库的微调进一步提高了所有任务的性能，从而展示了迄今为止在很大程度上被忽视的文本数据的价值。我们还确定了仅使用文本数据的llm的关键限制，我们的发现为进一步研究多模态文本几何模型提供了强大的动力。为了帮助和鼓励这一领域的进一步工作，我们公开了所有的数据和代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computing and Information Science in Engineering 工程技术-工程：制造

CiteScore

6.30

自引率

12.90%

发文量

100

审稿时长

6 months

期刊介绍： The ASME Journal of Computing and Information Science in Engineering (JCISE) publishes articles related to Algorithms, Computational Methods, Computing Infrastructure, Computer-Interpretable Representations, Human-Computer Interfaces, Information Science, and/or System Architectures that aim to improve some aspect of product and system lifecycle (e.g., design, manufacturing, operation, maintenance, disposal, recycling etc.). Applications considered in JCISE manuscripts should be relevant to the mechanical engineering discipline. Papers can be focused on fundamental research leading to new methods, or adaptation of existing methods for new applications. Scope: Advanced Computing Infrastructure; Artificial Intelligence; Big Data and Analytics; Collaborative Design; Computer Aided Design; Computer Aided Engineering; Computer Aided Manufacturing; Computational Foundations for Additive Manufacturing; Computational Foundations for Engineering Optimization; Computational Geometry; Computational Metrology; Computational Synthesis; Conceptual Design; Cybermanufacturing; Cyber Physical Security for Factories; Cyber Physical System Design and Operation; Data-Driven Engineering Applications; Engineering Informatics; Geometric Reasoning; GPU Computing for Design and Manufacturing; Human Computer Interfaces/Interactions; Industrial Internet of Things; Knowledge Engineering; Information Management; Inverse Methods for Engineering Applications; Machine Learning for Engineering Applications; Manufacturing Planning; Manufacturing Automation; Model-based Systems Engineering; Multiphysics Modeling and Simulation; Multiscale Modeling and Simulation; Multidisciplinary Optimization; Physics-Based Simulations; Process Modeling for Engineering Applications; Qualification, Verification and Validation of Computational Models; Symbolic Computing for Engineering Applications; Tolerance Modeling; Topology and Shape Optimization; Virtual and Augmented Reality Environments; Virtual Prototyping