利用工程设计知识进行检索增强生成

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2024-08-24 DOI:10.1016/j.knosys.2024.112410

L. Siddharth , Jianxi Luo

{"title":"利用工程设计知识进行检索增强生成","authors":"L. Siddharth , Jianxi Luo","doi":"10.1016/j.knosys.2024.112410","DOIUrl":null,"url":null,"abstract":"<div><p>Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {<em>head entity:: relationship:: tail entity}</em> from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"303 ","pages":"Article 112410"},"PeriodicalIF":7.2000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retrieval augmented generation using engineering design knowledge\",\"authors\":\"L. Siddharth , Jianxi Luo\",\"doi\":\"10.1016/j.knosys.2024.112410\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {<em>head entity:: relationship:: tail entity}</em> from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.</p></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"303 \",\"pages\":\"Article 112410\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095070512401044X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095070512401044X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

为了在设计过程中支持检索增强生成（RAG），我们提出了一种从专利人工制品描述中识别明确的工程设计事实--{头部实体：：关系：：尾部实体}的方法。给定一个句子，其中有一对实体（从名词短语中选取）以独特的方式标记，我们的方法可以提取出句子中明确传达的它们之间的关系。为此，我们创建了一个包含 375,084 个实例的数据集，并对关系识别（标记分类任务）和关系提取（序列到序列任务）的语言模型进行了微调。标记分类方法的准确率高达 99.7%。将该方法应用于包含 4,870 项风扇系统专利的领域后，我们填充了一个包含超过 293 万个事实的知识库。利用该知识库，我们展示了大型语言模型（LLM）如何在显性事实的指导下综合知识，并在设计过程中的知识检索任务中生成技术性和连贯的响应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Retrieval augmented generation using engineering design knowledge

Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {head entity:: relationship:: tail entity} from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.