语言模型的内部知识库抓取

Roi Cohen, Mor Geva, Jonathan Berant, A. Globerson
{"title":"语言模型的内部知识库抓取","authors":"Roi Cohen, Mor Geva, Jonathan Berant, A. Globerson","doi":"10.48550/arXiv.2301.12810","DOIUrl":null,"url":null,"abstract":"Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation.Here, we propose to address this goal by extracting a knowledge-graph of facts from a given language model. We describe a procedure for “crawling” the internal knowledge-base of a language model. Specifically, given a seed entity, we expand a knowledge-graph around it. The crawling procedure is decomposed into sub-tasks, realized through specially designed prompts that control for both precision (i.e., that no wrong facts are generated) and recall (i.e., the number of facts generated). We evaluate our approach on graphs crawled starting from dozens of seed entities, and show it yields high precision graphs (82-92%), while emitting a reasonable number of facts per entity.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"1811-1824"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Crawling The Internal Knowledge-Base of Language Models\",\"authors\":\"Roi Cohen, Mor Geva, Jonathan Berant, A. Globerson\",\"doi\":\"10.48550/arXiv.2301.12810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation.Here, we propose to address this goal by extracting a knowledge-graph of facts from a given language model. We describe a procedure for “crawling” the internal knowledge-base of a language model. Specifically, given a seed entity, we expand a knowledge-graph around it. The crawling procedure is decomposed into sub-tasks, realized through specially designed prompts that control for both precision (i.e., that no wrong facts are generated) and recall (i.e., the number of facts generated). We evaluate our approach on graphs crawled starting from dozens of seed entities, and show it yields high precision graphs (82-92%), while emitting a reasonable number of facts per entity.\",\"PeriodicalId\":73025,\"journal\":{\"name\":\"Findings (Sydney (N.S.W.)\",\"volume\":\"1 1\",\"pages\":\"1811-1824\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Findings (Sydney (N.S.W.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2301.12810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Findings (Sydney (N.S.W.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.12810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

摘要

语言模型是在大量文本上训练的,因此它们的参数可能包含大量的事实知识。由这些模型执行的任何下游任务都隐含地建立在这些事实之上,因此非常希望能够以一种可解释的方式表示这些知识体系。然而,目前还没有这样的表示机制。在这里,我们建议通过从给定的语言模型中提取事实知识图来实现这一目标。我们描述了一个“爬行”语言模型的内部知识库的过程。具体来说,给定一个种子实体,我们围绕它展开一个知识图谱。爬行过程被分解为子任务,通过特别设计的提示来实现,这些提示既控制精度(即没有生成错误的事实),也控制召回率(即生成的事实数量)。我们在从数十个种子实体开始抓取的图上评估了我们的方法,并表明它产生了高精度的图(82-92%),同时每个实体发出了合理数量的事实。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Crawling The Internal Knowledge-Base of Language Models
Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation.Here, we propose to address this goal by extracting a knowledge-graph of facts from a given language model. We describe a procedure for “crawling” the internal knowledge-base of a language model. Specifically, given a seed entity, we expand a knowledge-graph around it. The crawling procedure is decomposed into sub-tasks, realized through specially designed prompts that control for both precision (i.e., that no wrong facts are generated) and recall (i.e., the number of facts generated). We evaluate our approach on graphs crawled starting from dozens of seed entities, and show it yields high precision graphs (82-92%), while emitting a reasonable number of facts per entity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信