Extraction of gene-disease association from literature using BioBERT

Chuan Deng, Jiahui Zou, Jingwen Deng, M. Bai
{"title":"Extraction of gene-disease association from literature using BioBERT","authors":"Chuan Deng, Jiahui Zou, Jingwen Deng, M. Bai","doi":"10.1145/3448734.3450772","DOIUrl":null,"url":null,"abstract":"With the rapid growth of biomedical literatures, there are a large amount of bio-text data to be exploited. A wealth of knowledge concerning diseases associated with genes is present in those bio-text which is important for studies like drug-target discovery, even provide personalized medical treatment for different patients' genome conditions. BioBERT as a pre-trained BERT model with large-scale biomedical corpora, was proved has a great performance over other pre-trained language models on biomedical datasets. To make the use of a large amount of bio-text, in this paper we provide a good practice that use BioBERT to extract the gene-disease associations from bio-text, and it achieved an overall F-score of 79.98%. Hoping to inspire researchers in the biomedical field of natural language processing and be able to make applications in related fields to solve the problems encountered in the research.","PeriodicalId":105999,"journal":{"name":"The 2nd International Conference on Computing and Data Science","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2nd International Conference on Computing and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3448734.3450772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

With the rapid growth of biomedical literatures, there are a large amount of bio-text data to be exploited. A wealth of knowledge concerning diseases associated with genes is present in those bio-text which is important for studies like drug-target discovery, even provide personalized medical treatment for different patients' genome conditions. BioBERT as a pre-trained BERT model with large-scale biomedical corpora, was proved has a great performance over other pre-trained language models on biomedical datasets. To make the use of a large amount of bio-text, in this paper we provide a good practice that use BioBERT to extract the gene-disease associations from bio-text, and it achieved an overall F-score of 79.98%. Hoping to inspire researchers in the biomedical field of natural language processing and be able to make applications in related fields to solve the problems encountered in the research.
利用BioBERT从文献中提取基因与疾病的关联
随着生物医学文献的快速增长,有大量的生物文本数据有待开发。这些生物文本中存在着丰富的与基因相关的疾病知识,这对药物靶点发现等研究具有重要意义,甚至为不同患者的基因组状况提供个性化的医疗治疗。BioBERT作为一种基于大规模生物医学语料库的预训练BERT模型,在生物医学数据集上被证明具有比其他预训练语言模型更好的性能。为了充分利用大量的生物文本,本文提供了利用BioBERT从生物文本中提取基因-疾病关联的良好实践,总体f值达到79.98%。希望能启发生物医学领域的研究人员对自然语言处理的研究,并能将其应用于相关领域,解决研究中遇到的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信