Triplet extraction leveraging sentence transformers and dependency parsing

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2023-12-27 DOI:10.1016/j.array.2023.100334

Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação

{"title":"Triplet extraction leveraging sentence transformers and dependency parsing","authors":"Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação","doi":"10.1016/j.array.2023.100334","DOIUrl":null,"url":null,"abstract":"<div><p>Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (<em>UDASTE</em>) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. <em>UDASTE</em> is compared with two baseline models on three datasets. <em>UDASTE</em> outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"21 ","pages":"Article 100334"},"PeriodicalIF":2.3000,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005623000590/pdfft?md5=4d42cb559e16ed40cf0fee56cb903290&pid=1-s2.0-S2590005623000590-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005623000590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.

查看原文本刊更多论文

利用句子变换器和依赖关系解析进行三重提取

知识图谱是一种结构化（实体、关系、实体）三元组的工具。构建这些知识图谱的一种可行方法是从非结构化文本中提取三元组。这样做的目的是最大限度地增加有用三元组的数量，同时尽量减少不含信息或无用信息的三元组。该领域的大部分前人工作都使用了监督学习技术，这种技术不仅计算成本高，而且需要标注数据。而现有的无监督方法往往会产生过量的低价值三元组，在提取三元组时会依据经验规则，或者在实体与关系的顺序方面存在困难。为了解决这些问题，本文提出了一种新的模型：无监督依赖解析辅助语义三元提取（UDASTE）利用句子结构，允许定义限制性三元关系类型来生成高质量的三元，同时无需将提取的三元映射到关系模式。这是通过利用预训练的语言模型实现的。UDASTE 在三个数据集上与两个基准模型进行了比较。在所有三个数据集上，UDASTE 的表现都优于基线模型。除了在计算智能背景下实施该模型外，还讨论了其局限性和可能的进一步工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊