利用BioBERT从文献中提取基因与疾病的关联

The 2nd International Conference on Computing and Data Science Pub Date : 2021-01-28 DOI:10.1145/3448734.3450772

Chuan Deng, Jiahui Zou, Jingwen Deng, M. Bai

{"title":"利用BioBERT从文献中提取基因与疾病的关联","authors":"Chuan Deng, Jiahui Zou, Jingwen Deng, M. Bai","doi":"10.1145/3448734.3450772","DOIUrl":null,"url":null,"abstract":"With the rapid growth of biomedical literatures, there are a large amount of bio-text data to be exploited. A wealth of knowledge concerning diseases associated with genes is present in those bio-text which is important for studies like drug-target discovery, even provide personalized medical treatment for different patients' genome conditions. BioBERT as a pre-trained BERT model with large-scale biomedical corpora, was proved has a great performance over other pre-trained language models on biomedical datasets. To make the use of a large amount of bio-text, in this paper we provide a good practice that use BioBERT to extract the gene-disease associations from bio-text, and it achieved an overall F-score of 79.98%. Hoping to inspire researchers in the biomedical field of natural language processing and be able to make applications in related fields to solve the problems encountered in the research.","PeriodicalId":105999,"journal":{"name":"The 2nd International Conference on Computing and Data Science","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Extraction of gene-disease association from literature using BioBERT\",\"authors\":\"Chuan Deng, Jiahui Zou, Jingwen Deng, M. Bai\",\"doi\":\"10.1145/3448734.3450772\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid growth of biomedical literatures, there are a large amount of bio-text data to be exploited. A wealth of knowledge concerning diseases associated with genes is present in those bio-text which is important for studies like drug-target discovery, even provide personalized medical treatment for different patients' genome conditions. BioBERT as a pre-trained BERT model with large-scale biomedical corpora, was proved has a great performance over other pre-trained language models on biomedical datasets. To make the use of a large amount of bio-text, in this paper we provide a good practice that use BioBERT to extract the gene-disease associations from bio-text, and it achieved an overall F-score of 79.98%. Hoping to inspire researchers in the biomedical field of natural language processing and be able to make applications in related fields to solve the problems encountered in the research.\",\"PeriodicalId\":105999,\"journal\":{\"name\":\"The 2nd International Conference on Computing and Data Science\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2nd International Conference on Computing and Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3448734.3450772\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2nd International Conference on Computing and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3448734.3450772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

随着生物医学文献的快速增长，有大量的生物文本数据有待开发。这些生物文本中存在着丰富的与基因相关的疾病知识，这对药物靶点发现等研究具有重要意义，甚至为不同患者的基因组状况提供个性化的医疗治疗。BioBERT作为一种基于大规模生物医学语料库的预训练BERT模型，在生物医学数据集上被证明具有比其他预训练语言模型更好的性能。为了充分利用大量的生物文本，本文提供了利用BioBERT从生物文本中提取基因-疾病关联的良好实践，总体f值达到79.98%。希望能启发生物医学领域的研究人员对自然语言处理的研究，并能将其应用于相关领域，解决研究中遇到的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extraction of gene-disease association from literature using BioBERT

With the rapid growth of biomedical literatures, there are a large amount of bio-text data to be exploited. A wealth of knowledge concerning diseases associated with genes is present in those bio-text which is important for studies like drug-target discovery, even provide personalized medical treatment for different patients' genome conditions. BioBERT as a pre-trained BERT model with large-scale biomedical corpora, was proved has a great performance over other pre-trained language models on biomedical datasets. To make the use of a large amount of bio-text, in this paper we provide a good practice that use BioBERT to extract the gene-disease associations from bio-text, and it achieved an overall F-score of 79.98%. Hoping to inspire researchers in the biomedical field of natural language processing and be able to make applications in related fields to solve the problems encountered in the research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 2nd International Conference on Computing and Data Science

自引率

0.00%

发文量