Triplet extraction leveraging sentence transformers and dependency parsing

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS
Array Pub Date : 2023-12-27 DOI:10.1016/j.array.2023.100334
Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação
{"title":"Triplet extraction leveraging sentence transformers and dependency parsing","authors":"Stuart Gallina Ottersen,&nbsp;Flávio Pinheiro,&nbsp;Fernando Bação","doi":"10.1016/j.array.2023.100334","DOIUrl":null,"url":null,"abstract":"<div><p>Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (<em>UDASTE</em>) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. <em>UDASTE</em> is compared with two baseline models on three datasets. <em>UDASTE</em> outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100334"},"PeriodicalIF":2.3000,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000590/pdfft?md5=4d42cb559e16ed40cf0fee56cb903290&pid=1-s2.0-S2590005623000590-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005623000590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.

利用句子变换器和依赖关系解析进行三重提取
知识图谱是一种结构化(实体、关系、实体)三元组的工具。构建这些知识图谱的一种可行方法是从非结构化文本中提取三元组。这样做的目的是最大限度地增加有用三元组的数量,同时尽量减少不含信息或无用信息的三元组。该领域的大部分前人工作都使用了监督学习技术,这种技术不仅计算成本高,而且需要标注数据。而现有的无监督方法往往会产生过量的低价值三元组,在提取三元组时会依据经验规则,或者在实体与关系的顺序方面存在困难。为了解决这些问题,本文提出了一种新的模型:无监督依赖解析辅助语义三元提取(UDASTE)利用句子结构,允许定义限制性三元关系类型来生成高质量的三元,同时无需将提取的三元映射到关系模式。这是通过利用预训练的语言模型实现的。UDASTE 在三个数据集上与两个基准模型进行了比较。在所有三个数据集上,UDASTE 的表现都优于基线模型。除了在计算智能背景下实施该模型外,还讨论了其局限性和可能的进一步工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Array
Array Computer Science-General Computer Science
CiteScore
4.40
自引率
0.00%
发文量
93
审稿时长
45 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信