生物医学知识图嵌入的基准和最佳实践。

David Chang, Ivana Balažević, Carl Allen, Daniel Chawla, Cynthia Brandt, Richard Andrew Taylor
{"title":"生物医学知识图嵌入的基准和最佳实践。","authors":"David Chang,&nbsp;Ivana Balažević,&nbsp;Carl Allen,&nbsp;Daniel Chawla,&nbsp;Cynthia Brandt,&nbsp;Richard Andrew Taylor","doi":"10.18653/v1/2020.bionlp-1.18","DOIUrl":null,"url":null,"abstract":"<p><p>Much of biomedical and healthcare data is encoded in discrete, symbolic form such as text and medical codes. There is a wealth of expert-curated biomedical domain knowledge stored in knowledge bases and ontologies, but the lack of reliable methods for learning knowledge representation has limited their usefulness in machine learning applications. While text-based representation learning has significantly improved in recent years through advances in natural language processing, attempts to learn biomedical concept embeddings so far have been lacking. A recent family of models called knowledge graph embeddings have shown promising results on general domain knowledge graphs, and we explore their capabilities in the biomedical domain. We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph, provide a benchmark with comparison to existing methods and in-depth discussion on best practices, and make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation. The embeddings, code, and materials will be made available to the community.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"167-176"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7971091/pdf/nihms-1676481.pdf","citationCount":"24","resultStr":"{\"title\":\"Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings.\",\"authors\":\"David Chang,&nbsp;Ivana Balažević,&nbsp;Carl Allen,&nbsp;Daniel Chawla,&nbsp;Cynthia Brandt,&nbsp;Richard Andrew Taylor\",\"doi\":\"10.18653/v1/2020.bionlp-1.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Much of biomedical and healthcare data is encoded in discrete, symbolic form such as text and medical codes. There is a wealth of expert-curated biomedical domain knowledge stored in knowledge bases and ontologies, but the lack of reliable methods for learning knowledge representation has limited their usefulness in machine learning applications. While text-based representation learning has significantly improved in recent years through advances in natural language processing, attempts to learn biomedical concept embeddings so far have been lacking. A recent family of models called knowledge graph embeddings have shown promising results on general domain knowledge graphs, and we explore their capabilities in the biomedical domain. We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph, provide a benchmark with comparison to existing methods and in-depth discussion on best practices, and make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation. The embeddings, code, and materials will be made available to the community.</p>\",\"PeriodicalId\":74541,\"journal\":{\"name\":\"Proceedings of the conference. Association for Computational Linguistics. Meeting\",\"volume\":\" \",\"pages\":\"167-176\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7971091/pdf/nihms-1676481.pdf\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the conference. Association for Computational Linguistics. Meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2020.bionlp-1.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the conference. Association for Computational Linguistics. Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.bionlp-1.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

摘要

许多生物医学和医疗保健数据以离散的符号形式编码,如文本和医疗代码。在知识库和本体中存储着丰富的专家管理的生物医学领域知识,但是缺乏可靠的学习知识表示方法限制了它们在机器学习应用中的实用性。近年来,随着自然语言处理的进步,基于文本的表示学习有了显著的改善,但迄今为止,学习生物医学概念嵌入的尝试还很缺乏。最近一组称为知识图嵌入的模型在一般领域知识图上显示了有希望的结果,我们探索了它们在生物医学领域的能力。我们在SNOMED-CT知识图上训练了几个最先进的知识图嵌入模型,提供了与现有方法比较的基准和对最佳实践的深入讨论,并说明了利用知识图的多关系性质学习生物医学知识表示的重要性。嵌入、代码和材料将提供给社区。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings.

Much of biomedical and healthcare data is encoded in discrete, symbolic form such as text and medical codes. There is a wealth of expert-curated biomedical domain knowledge stored in knowledge bases and ontologies, but the lack of reliable methods for learning knowledge representation has limited their usefulness in machine learning applications. While text-based representation learning has significantly improved in recent years through advances in natural language processing, attempts to learn biomedical concept embeddings so far have been lacking. A recent family of models called knowledge graph embeddings have shown promising results on general domain knowledge graphs, and we explore their capabilities in the biomedical domain. We train several state-of-the-art knowledge graph embedding models on the SNOMED-CT knowledge graph, provide a benchmark with comparison to existing methods and in-depth discussion on best practices, and make a case for the importance of leveraging the multi-relational nature of knowledge graphs for learning biomedical knowledge representation. The embeddings, code, and materials will be made available to the community.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信