Relation Extraction in Biomedical Texts Based on Multi-Head Attention Model With Syntactic Dependency Feature: Modeling Study.

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS
Yongbin Li, Linhu Hui, Liping Zou, Huyang Li, Luo Xu, Xiaohua Wang, Stephanie Chua
{"title":"Relation Extraction in Biomedical Texts Based on Multi-Head Attention Model With Syntactic Dependency Feature: Modeling Study.","authors":"Yongbin Li,&nbsp;Linhu Hui,&nbsp;Liping Zou,&nbsp;Huyang Li,&nbsp;Luo Xu,&nbsp;Xiaohua Wang,&nbsp;Stephanie Chua","doi":"10.2196/41136","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>With the rapid expansion of biomedical literature, biomedical information extraction has attracted increasing attention from researchers. In particular, relation extraction between 2 entities is a long-term research topic.</p><p><strong>Objective: </strong>This study aimed to perform 2 multiclass relation extraction tasks of Biomedical Natural Language Processing Workshop 2019 Open Shared Tasks: relation extraction of Bacteria-Biotope (BB-rel) task and binary relation extraction of plant seed development (SeeDev-binary) task. In essence, these 2 tasks are aimed at extracting the relation between annotated entity pairs from biomedical texts, which is a challenging problem.</p><p><strong>Methods: </strong>Traditional research methods adopted feature- or kernel-based methods and achieved good performance. For these tasks, we propose a deep learning model based on a combination of several distributed features, such as domain-specific word embedding, part-of-speech embedding, entity-type embedding, distance embedding, and position embedding. The multi-head attention mechanism is used to extract the global semantic features of an entire sentence. Meanwhile, we introduced a dependency-type feature and the shortest dependency path connecting 2 candidate entities in the syntactic dependency graph to enrich the feature representation.</p><p><strong>Results: </strong>Experiments show that our proposed model has excellent performance in biomedical relation extraction, achieving F<sub>1</sub> scores of 65.56% and 38.04% on the test sets of the BB-rel and SeeDev-binary tasks. Especially in the SeeDev-binary task, the F<sub>1</sub> score of our model is superior to that of other existing models and achieves state-of-the-art performance.</p><p><strong>Conclusions: </strong>We demonstrated that the multi-head attention mechanism can learn relevant syntactic and semantic features in different representation subspaces and different positions to extract comprehensive feature representation. Moreover, syntactic dependency features can improve the performance of the model by learning dependency relation between the entities in biomedical texts.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e41136"},"PeriodicalIF":3.8000,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9634522/pdf/","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/41136","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 3

Abstract

Background: With the rapid expansion of biomedical literature, biomedical information extraction has attracted increasing attention from researchers. In particular, relation extraction between 2 entities is a long-term research topic.

Objective: This study aimed to perform 2 multiclass relation extraction tasks of Biomedical Natural Language Processing Workshop 2019 Open Shared Tasks: relation extraction of Bacteria-Biotope (BB-rel) task and binary relation extraction of plant seed development (SeeDev-binary) task. In essence, these 2 tasks are aimed at extracting the relation between annotated entity pairs from biomedical texts, which is a challenging problem.

Methods: Traditional research methods adopted feature- or kernel-based methods and achieved good performance. For these tasks, we propose a deep learning model based on a combination of several distributed features, such as domain-specific word embedding, part-of-speech embedding, entity-type embedding, distance embedding, and position embedding. The multi-head attention mechanism is used to extract the global semantic features of an entire sentence. Meanwhile, we introduced a dependency-type feature and the shortest dependency path connecting 2 candidate entities in the syntactic dependency graph to enrich the feature representation.

Results: Experiments show that our proposed model has excellent performance in biomedical relation extraction, achieving F1 scores of 65.56% and 38.04% on the test sets of the BB-rel and SeeDev-binary tasks. Especially in the SeeDev-binary task, the F1 score of our model is superior to that of other existing models and achieves state-of-the-art performance.

Conclusions: We demonstrated that the multi-head attention mechanism can learn relevant syntactic and semantic features in different representation subspaces and different positions to extract comprehensive feature representation. Moreover, syntactic dependency features can improve the performance of the model by learning dependency relation between the entities in biomedical texts.

Abstract Image

Abstract Image

Abstract Image

基于具有句法依赖特征的多头注意模型的生物医学文本关系提取:建模研究。
背景:随着生物医学文献的迅速膨胀,生物医学信息提取越来越受到研究者的关注。特别是两个实体之间的关系提取是一个长期的研究课题。目的:本研究旨在完成生物医学自然语言处理研讨会2019开放共享任务的2个多类关系提取任务:细菌-生物群落关系提取(BB-rel)任务和植物种子发育二元关系提取(SeeDev-binary)任务。从本质上讲,这两个任务都是为了从生物医学文本中提取标注实体对之间的关系,这是一个具有挑战性的问题。方法:传统的研究方法采用基于特征或核的方法,取得了较好的效果。针对这些任务,我们提出了一种基于多个分布式特征的深度学习模型,如特定领域的词嵌入、词性嵌入、实体类型嵌入、距离嵌入和位置嵌入。采用多头注意机制提取整个句子的整体语义特征。同时,我们在句法依赖图中引入了依赖类型特征和连接2个候选实体的最短依赖路径,丰富了特征表示。结果:实验表明,我们提出的模型在生物医学关系提取方面具有优异的性能,在BB-rel和seedev -二元任务的测试集上F1得分分别为65.56%和38.04%。特别是在SeeDev-binary任务中,我们模型的F1分数优于其他现有模型,达到了最先进的性能。结论:多头注意机制可以在不同的表征子空间和不同的位置学习相关的句法和语义特征,提取综合的特征表征。此外,句法依赖特征可以通过学习生物医学文本中实体之间的依赖关系来提高模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信