Dachuan Shi , Jianzhang Li , Olga Meyer , Thomas Bauernhansl
{"title":"面向认知数字孪生的可互操作工业知识表示和推理的检索增强生成","authors":"Dachuan Shi , Jianzhang Li , Olga Meyer , Thomas Bauernhansl","doi":"10.1016/j.compind.2025.104330","DOIUrl":null,"url":null,"abstract":"<div><div>The escalating volume and complexity of digital data within the manufacturing sector highlight an urgent need for an efficient knowledge representation and inference solution. Traditional approaches, which often rely on ontologies, knowledge graphs, or digital twins (DTs) for knowledge representation, and rule-based algorithms for inference, are becoming insufficient. The emergence of generative AI, particularly large language models (LLM) and retrieval-augmented generation (RAG), offers a more efficient and intelligent alternative. However, the performance of an RAG system is heavily dependent on the quality of retrieval results, which can be compromised by domain-specific knowledge and retrieval distractors. To address this challenge, we propose to enhance RAG systems tailored for the manufacturing industry in two aspects. First, we utilize the Asset Administration Shell (AAS), which represents the German industrial perspective on cognitive DTs, to create a representation of assets and knowledge in standardized information models. This establishes a robust foundation for the retrieval sources. Second, we propose a contrastive selection loss (CSL) to fine-tune an open-source LLM to refine the retrieval results. Fine-tuned LLMs possess higher efficiency and accuracy on task- and domain-specific datasets, while the CSL further enhances the model's ability to distinguish true positives from similar distractors. The enhanced RAG system is demonstrated in a robotic work cell integration use case and evaluated through a novel evaluation protocol. Additionally, the retrieval effectiveness of the RAG system, specifically the LLM fine-tuned with CSL, is extensively validated through statistical experiments. The results confirm its superior performance over state-of-the-art methods, including GPT-4 with in-context learning prompts and other fine-tuned models.</div></div>","PeriodicalId":55219,"journal":{"name":"Computers in Industry","volume":"171 ","pages":"Article 104330"},"PeriodicalIF":8.2000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing retrieval-augmented generation for interoperable industrial knowledge representation and inference toward cognitive digital twins\",\"authors\":\"Dachuan Shi , Jianzhang Li , Olga Meyer , Thomas Bauernhansl\",\"doi\":\"10.1016/j.compind.2025.104330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The escalating volume and complexity of digital data within the manufacturing sector highlight an urgent need for an efficient knowledge representation and inference solution. Traditional approaches, which often rely on ontologies, knowledge graphs, or digital twins (DTs) for knowledge representation, and rule-based algorithms for inference, are becoming insufficient. The emergence of generative AI, particularly large language models (LLM) and retrieval-augmented generation (RAG), offers a more efficient and intelligent alternative. However, the performance of an RAG system is heavily dependent on the quality of retrieval results, which can be compromised by domain-specific knowledge and retrieval distractors. To address this challenge, we propose to enhance RAG systems tailored for the manufacturing industry in two aspects. First, we utilize the Asset Administration Shell (AAS), which represents the German industrial perspective on cognitive DTs, to create a representation of assets and knowledge in standardized information models. This establishes a robust foundation for the retrieval sources. Second, we propose a contrastive selection loss (CSL) to fine-tune an open-source LLM to refine the retrieval results. Fine-tuned LLMs possess higher efficiency and accuracy on task- and domain-specific datasets, while the CSL further enhances the model's ability to distinguish true positives from similar distractors. The enhanced RAG system is demonstrated in a robotic work cell integration use case and evaluated through a novel evaluation protocol. Additionally, the retrieval effectiveness of the RAG system, specifically the LLM fine-tuned with CSL, is extensively validated through statistical experiments. The results confirm its superior performance over state-of-the-art methods, including GPT-4 with in-context learning prompts and other fine-tuned models.</div></div>\",\"PeriodicalId\":55219,\"journal\":{\"name\":\"Computers in Industry\",\"volume\":\"171 \",\"pages\":\"Article 104330\"},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Industry\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0166361525000958\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Industry","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166361525000958","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Enhancing retrieval-augmented generation for interoperable industrial knowledge representation and inference toward cognitive digital twins
The escalating volume and complexity of digital data within the manufacturing sector highlight an urgent need for an efficient knowledge representation and inference solution. Traditional approaches, which often rely on ontologies, knowledge graphs, or digital twins (DTs) for knowledge representation, and rule-based algorithms for inference, are becoming insufficient. The emergence of generative AI, particularly large language models (LLM) and retrieval-augmented generation (RAG), offers a more efficient and intelligent alternative. However, the performance of an RAG system is heavily dependent on the quality of retrieval results, which can be compromised by domain-specific knowledge and retrieval distractors. To address this challenge, we propose to enhance RAG systems tailored for the manufacturing industry in two aspects. First, we utilize the Asset Administration Shell (AAS), which represents the German industrial perspective on cognitive DTs, to create a representation of assets and knowledge in standardized information models. This establishes a robust foundation for the retrieval sources. Second, we propose a contrastive selection loss (CSL) to fine-tune an open-source LLM to refine the retrieval results. Fine-tuned LLMs possess higher efficiency and accuracy on task- and domain-specific datasets, while the CSL further enhances the model's ability to distinguish true positives from similar distractors. The enhanced RAG system is demonstrated in a robotic work cell integration use case and evaluated through a novel evaluation protocol. Additionally, the retrieval effectiveness of the RAG system, specifically the LLM fine-tuned with CSL, is extensively validated through statistical experiments. The results confirm its superior performance over state-of-the-art methods, including GPT-4 with in-context learning prompts and other fine-tuned models.
期刊介绍:
The objective of Computers in Industry is to present original, high-quality, application-oriented research papers that:
• Illuminate emerging trends and possibilities in the utilization of Information and Communication Technology in industry;
• Establish connections or integrations across various technology domains within the expansive realm of computer applications for industry;
• Foster connections or integrations across diverse application areas of ICT in industry.