{"title":"利用工程设计知识进行检索增强生成","authors":"L. Siddharth , Jianxi Luo","doi":"10.1016/j.knosys.2024.112410","DOIUrl":null,"url":null,"abstract":"<div><p>Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {<em>head entity:: relationship:: tail entity}</em> from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"303 ","pages":"Article 112410"},"PeriodicalIF":7.2000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retrieval augmented generation using engineering design knowledge\",\"authors\":\"L. Siddharth , Jianxi Luo\",\"doi\":\"10.1016/j.knosys.2024.112410\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {<em>head entity:: relationship:: tail entity}</em> from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.</p></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"303 \",\"pages\":\"Article 112410\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095070512401044X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095070512401044X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Retrieval augmented generation using engineering design knowledge
Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {head entity:: relationship:: tail entity} from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.