Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach.

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Alfredo Varela-Vega, Ali-Berenice Posada-Reyes, Carlos-Francisco Méndez-Cruz
{"title":"Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach.","authors":"Alfredo Varela-Vega, Ali-Berenice Posada-Reyes, Carlos-Francisco Méndez-Cruz","doi":"10.1093/database/baae094","DOIUrl":null,"url":null,"abstract":"<p><p>Transcriptional regulatory networks (TRNs) give a global view of the regulatory mechanisms of bacteria to respond to environmental signals. These networks are published in biological databases as a valuable resource for experimental and bioinformatics researchers. Despite the efforts to publish TRNs of diverse bacteria, many of them still lack one and many of the existing TRNs are incomplete. In addition, the manual extraction of information from biomedical literature (\"literature curation\") has been the traditional way to extract these networks, despite this being demanding and time-consuming. Recently, language models based on pretrained transformers have been used to extract relevant knowledge from biomedical literature. Moreover, the benefit of fine-tuning a large pretrained model with new limited data for a specific task (\"transfer learning\") opens roads to address new problems of biomedical information extraction. Here, to alleviate this lack of knowledge and assist literature curation, we present a new approach based on the Bidirectional Transformer for Language Understanding (BERT) architecture to classify transcriptional regulatory interactions of bacteria as a first step to extract TRNs from literature. The approach achieved a significant performance in a test dataset of sentences of Escherichia coli (F1-Score: 0.8685, Matthew's correlation coefficient: 0.8163). The examination of model predictions revealed that the model learned different ways to express the regulatory interaction. The approach was evaluated to extract a TRN of Salmonella using 264 complete articles. The evaluation showed that the approach was able to accurately extract 82% of the network and that it was able to extract interactions absent in curation data. To the best of our knowledge, the present study is the first effort to obtain a BERT-based approach to extract this specific kind of interaction. This approach is a starting point to address the limitations of reconstructing TRNs of bacteria and diseases of biological interest. Database URL: https://github.com/laigen-unam/BERT-trn-extraction.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363960/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baae094","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Transcriptional regulatory networks (TRNs) give a global view of the regulatory mechanisms of bacteria to respond to environmental signals. These networks are published in biological databases as a valuable resource for experimental and bioinformatics researchers. Despite the efforts to publish TRNs of diverse bacteria, many of them still lack one and many of the existing TRNs are incomplete. In addition, the manual extraction of information from biomedical literature ("literature curation") has been the traditional way to extract these networks, despite this being demanding and time-consuming. Recently, language models based on pretrained transformers have been used to extract relevant knowledge from biomedical literature. Moreover, the benefit of fine-tuning a large pretrained model with new limited data for a specific task ("transfer learning") opens roads to address new problems of biomedical information extraction. Here, to alleviate this lack of knowledge and assist literature curation, we present a new approach based on the Bidirectional Transformer for Language Understanding (BERT) architecture to classify transcriptional regulatory interactions of bacteria as a first step to extract TRNs from literature. The approach achieved a significant performance in a test dataset of sentences of Escherichia coli (F1-Score: 0.8685, Matthew's correlation coefficient: 0.8163). The examination of model predictions revealed that the model learned different ways to express the regulatory interaction. The approach was evaluated to extract a TRN of Salmonella using 264 complete articles. The evaluation showed that the approach was able to accurately extract 82% of the network and that it was able to extract interactions absent in curation data. To the best of our knowledge, the present study is the first effort to obtain a BERT-based approach to extract this specific kind of interaction. This approach is a starting point to address the limitations of reconstructing TRNs of bacteria and diseases of biological interest. Database URL: https://github.com/laigen-unam/BERT-trn-extraction.

使用基于 BERT 的方法从生物医学文献中自动提取细菌的转录调控相互作用。
转录调控网络(TRN)提供了细菌响应环境信号的调控机制的全局视图。这些网络发布在生物数据库中,是实验和生物信息学研究人员的宝贵资源。尽管人们努力发布各种细菌的调控网络,但许多细菌仍然缺乏调控网络,而且许多现有的调控网络并不完整。此外,从生物医学文献中手动提取信息("文献整理")一直是提取这些网络的传统方法,尽管这种方法要求高且耗时。最近,基于预训练转换器的语言模型被用于从生物医学文献中提取相关知识。此外,针对特定任务使用新的有限数据对大型预训练模型进行微调("迁移学习")的好处为解决生物医学信息提取的新问题开辟了道路。在此,为了缓解这种知识匮乏并协助文献整理,我们提出了一种基于双向语言理解转换器(BERT)架构的新方法,对细菌的转录调控相互作用进行分类,作为从文献中提取 TRN 的第一步。该方法在大肠杆菌句子测试数据集中取得了显著的性能(F1-分数:0.8685,马修相关系数:0.8163)。对模型预测的检查表明,该模型学会了表达调控相互作用的不同方式。利用 264 篇完整文章对该方法进行了评估,以提取沙门氏菌的 TRN。评估结果表明,该方法能够准确提取网络中 82% 的内容,而且还能提取出馆藏数据中不存在的相互作用。据我们所知,本研究是首次采用基于 BERT 的方法来提取这种特定的交互作用。这种方法是解决重建细菌和生物疾病 TRN 的局限性的一个起点。数据库网址:https://github.com/laigen-unam/BERT-trn-extraction.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信