基于知识的智能文本简化用于生物关系提取

IF 2.8 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Informatics Pub Date : 2023-12-11 DOI:10.3390/informatics10040089

Jaskaran Gill, Madhu Chetty, Suryani Lim, Jennifer Hallinan

{"title":"基于知识的智能文本简化用于生物关系提取","authors":"Jaskaran Gill, Madhu Chetty, Suryani Lim, Jennifer Hallinan","doi":"10.3390/informatics10040089","DOIUrl":null,"url":null,"abstract":"Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a Knowledge-based Intelligent Text Simplification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.","PeriodicalId":37100,"journal":{"name":"Informatics","volume":"31 5","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction\",\"authors\":\"Jaskaran Gill, Madhu Chetty, Suryani Lim, Jennifer Hallinan\",\"doi\":\"10.3390/informatics10040089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a Knowledge-based Intelligent Text Simplification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.\",\"PeriodicalId\":37100,\"journal\":{\"name\":\"Informatics\",\"volume\":\"31 5\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2023-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/informatics10040089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/informatics10040089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

从生物出版物中提取关联信息在加速科学发现和推动医学研究方面发挥着举足轻重的作用。虽然已出版的文献中存储了大量此类知识，但从持续增长的文档中手动提取这些知识却变得越来越困难。最近，人们开始关注使用预训练的大语言模型（LLM）和深度学习算法自动提取这类知识。然而，生物句子的句法结构复杂，包含嵌套实体和特定领域术语，而且注释训练语料不足，这给从非结构化数据中准确捕捉实体关系带来了重大挑战。为了解决这些问题，我们在本文中提出了一种基于知识的智能文本简化（KITS）方法，重点关注生物关系的准确提取。KITS 能够准确捕捉句子中各种二元关系之间的关系上下文，同时防止被 KITS 简化的句子出现任何潜在的意义变化。实验结果表明，利用著名的性能指标，所提出的技术在逻辑学习语言（LLL）数据集中的精确度提高了 21%，只有 25% 的句子被简化。将提出的方法与 BioBERT 结合使用，流行的预训练 LLM 能够超越其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction

Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a Knowledge-based Intelligent Text Simplification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Informatics Social Sciences-Communication

CiteScore

6.60

自引率

6.50%

发文量

审稿时长

6 weeks