Extraction of gene-disease association from literature using BioBERT

The 2nd International Conference on Computing and Data Science Pub Date : 2021-01-28 DOI:10.1145/3448734.3450772

Chuan Deng, Jiahui Zou, Jingwen Deng, M. Bai

引用次数: 4

Abstract

With the rapid growth of biomedical literatures, there are a large amount of bio-text data to be exploited. A wealth of knowledge concerning diseases associated with genes is present in those bio-text which is important for studies like drug-target discovery, even provide personalized medical treatment for different patients' genome conditions. BioBERT as a pre-trained BERT model with large-scale biomedical corpora, was proved has a great performance over other pre-trained language models on biomedical datasets. To make the use of a large amount of bio-text, in this paper we provide a good practice that use BioBERT to extract the gene-disease associations from bio-text, and it achieved an overall F-score of 79.98%. Hoping to inspire researchers in the biomedical field of natural language processing and be able to make applications in related fields to solve the problems encountered in the research.

查看原文本刊更多论文

利用BioBERT从文献中提取基因与疾病的关联

随着生物医学文献的快速增长，有大量的生物文本数据有待开发。这些生物文本中存在着丰富的与基因相关的疾病知识，这对药物靶点发现等研究具有重要意义，甚至为不同患者的基因组状况提供个性化的医疗治疗。BioBERT作为一种基于大规模生物医学语料库的预训练BERT模型，在生物医学数据集上被证明具有比其他预训练语言模型更好的性能。为了充分利用大量的生物文本，本文提供了利用BioBERT从生物文本中提取基因-疾病关联的良好实践，总体f值达到79.98%。希望能启发生物医学领域的研究人员对自然语言处理的研究，并能将其应用于相关领域，解决研究中遇到的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 2nd International Conference on Computing and Data Science

自引率

0.00%

发文量