利用工程设计知识进行检索增强生成

IF 4.4 2区 化学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
L. Siddharth , Jianxi Luo
{"title":"利用工程设计知识进行检索增强生成","authors":"L. Siddharth ,&nbsp;Jianxi Luo","doi":"10.1016/j.knosys.2024.112410","DOIUrl":null,"url":null,"abstract":"<div><p>Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {<em>head entity:: relationship:: tail entity}</em> from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.</p></div>","PeriodicalId":7,"journal":{"name":"ACS Applied Polymer Materials","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retrieval augmented generation using engineering design knowledge\",\"authors\":\"L. Siddharth ,&nbsp;Jianxi Luo\",\"doi\":\"10.1016/j.knosys.2024.112410\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {<em>head entity:: relationship:: tail entity}</em> from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.</p></div>\",\"PeriodicalId\":7,\"journal\":{\"name\":\"ACS Applied Polymer Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Polymer Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095070512401044X\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Polymer Materials","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095070512401044X","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

为了在设计过程中支持检索增强生成(RAG),我们提出了一种从专利人工制品描述中识别明确的工程设计事实--{头部实体::关系::尾部实体}的方法。给定一个句子,其中有一对实体(从名词短语中选取)以独特的方式标记,我们的方法可以提取出句子中明确传达的它们之间的关系。为此,我们创建了一个包含 375,084 个实例的数据集,并对关系识别(标记分类任务)和关系提取(序列到序列任务)的语言模型进行了微调。标记分类方法的准确率高达 99.7%。将该方法应用于包含 4,870 项风扇系统专利的领域后,我们填充了一个包含超过 293 万个事实的知识库。利用该知识库,我们展示了大型语言模型(LLM)如何在显性事实的指导下综合知识,并在设计过程中的知识检索任务中生成技术性和连贯的响应。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Retrieval augmented generation using engineering design knowledge

Aiming to support Retrieval Augmented Generation (RAG) in the design process, we present a method to identify explicit, engineering design facts – {head entity:: relationship:: tail entity} from patented artefact descriptions. Given a sentence with a pair of entities (selected from noun phrases) marked in a unique manner, our method extracts their relationship that is explicitly communicated in the sentence. For this task, we create a dataset of 375,084 examples and fine-tune language models for relation identification (token classification task) and relation elicitation (sequence-to-sequence task). The token classification approach achieves up to 99.7% accuracy. Upon applying the method to a domain of 4,870 fan system patents, we populate a knowledge base of over 2.93 million facts. Using this knowledge base, we demonstrate how Large Language Models (LLMs) are guided by explicit facts to synthesise knowledge and generate technical and cohesive responses when sought out for knowledge retrieval tasks in the design process.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
6.00%
发文量
810
期刊介绍: ACS Applied Polymer Materials is an interdisciplinary journal publishing original research covering all aspects of engineering, chemistry, physics, and biology relevant to applications of polymers. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates fundamental knowledge in the areas of materials, engineering, physics, bioscience, polymer science and chemistry into important polymer applications. The journal is specifically interested in work that addresses relationships among structure, processing, morphology, chemistry, properties, and function as well as work that provide insights into mechanisms critical to the performance of the polymer for applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信