Towards a self-cognitive complex product design system: A fine-grained multi-modal feature recognition and semantic understanding approach using large language models in mechanical engineering

IF 8 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Advanced Engineering Informatics Pub Date : 2025-03-22 DOI:10.1016/j.aei.2025.103265

Xinxin Liang, Zuoxu Wang, Jihong Liu

{"title":"Towards a self-cognitive complex product design system: A fine-grained multi-modal feature recognition and semantic understanding approach using large language models in mechanical engineering","authors":"Xinxin Liang, Zuoxu Wang, Jihong Liu","doi":"10.1016/j.aei.2025.103265","DOIUrl":null,"url":null,"abstract":"<div><div>Facing the promising tendency of human-artificial intelligence (AI) collaborative product design, fine-grained and multi-modal mechanical part recognition and semantic understanding have become a basic task for achieving a self-cognitive product design system. However, traditional semantic understanding approaches for mechanical parts can only handle single-modal data, which is either textual or image data, resulting in the following limitations 1) insufficient mining on fine-grained part’s functional/behavioral/structural information, and 2) ineffectiveness on multi-modal part information alignment, therefore restricting the intelligence level of the previous product design assistants. To mitigate these challenges, this paper proposes a fine-grained multimodal reasoning approach for mechanical part semantic understanding. The proposed approach utilizes a pre-trained Convolutional Neural Network (CNN) for visual feature extraction, a large language model (LLM) called LLaMA3 for advanced textual analysis, and a Unified Feature Fusion Module (UFFM) to facilitate robust cross-modal interactions. A positive and negative sample generation mechanism is implemented to refine the model’s ability to discern subtle variations in complex components. Experimental evaluations on the Industrial Part Multimodal Dataset (IPMD) demonstrate a significant improvement in classification accuracy, providing a more precise and intelligent solution for the semantic understanding in complex product design systems.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103265"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625001582","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Facing the promising tendency of human-artificial intelligence (AI) collaborative product design, fine-grained and multi-modal mechanical part recognition and semantic understanding have become a basic task for achieving a self-cognitive product design system. However, traditional semantic understanding approaches for mechanical parts can only handle single-modal data, which is either textual or image data, resulting in the following limitations 1) insufficient mining on fine-grained part’s functional/behavioral/structural information, and 2) ineffectiveness on multi-modal part information alignment, therefore restricting the intelligence level of the previous product design assistants. To mitigate these challenges, this paper proposes a fine-grained multimodal reasoning approach for mechanical part semantic understanding. The proposed approach utilizes a pre-trained Convolutional Neural Network (CNN) for visual feature extraction, a large language model (LLM) called LLaMA3 for advanced textual analysis, and a Unified Feature Fusion Module (UFFM) to facilitate robust cross-modal interactions. A positive and negative sample generation mechanism is implemented to refine the model’s ability to discern subtle variations in complex components. Experimental evaluations on the Industrial Part Multimodal Dataset (IPMD) demonstrate a significant improvement in classification accuracy, providing a more precise and intelligent solution for the semantic understanding in complex product design systems.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Advanced Engineering Informatics 工程技术-工程：综合

CiteScore

12.40

自引率

18.20%

发文量

292

审稿时长

45 days

期刊介绍： Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.