Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks.

Goonmeet Bajaj, Vinh Nguyen, Thilini Wijesiriwardene, Hong Yung Yip, Vishesh Javangula, Srinivasan Parthasarathy, Amit Sheth, Olivier Bodenreider
{"title":"Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks.","authors":"Goonmeet Bajaj,&nbsp;Vinh Nguyen,&nbsp;Thilini Wijesiriwardene,&nbsp;Hong Yung Yip,&nbsp;Vishesh Javangula,&nbsp;Srinivasan Parthasarathy,&nbsp;Amit Sheth,&nbsp;Olivier Bodenreider","doi":"10.18653/v1/2022.insights-1.11","DOIUrl":null,"url":null,"abstract":"<p><p>Recent work uses a Siamese Network, initialized with BioWordVec embeddings (distributed word embeddings), for predicting synonymy among biomedical terms to automate a part of the UMLS (Unified Medical Language System) Metathesaurus construction process. We evaluate the use of contextualized word embeddings extracted from nine different biomedical BERT-based models for synonymy prediction in the UMLS by replacing BioWordVec embeddings with embeddings extracted from each biomedical BERT model using different feature extraction methods. Surprisingly, we find that Siamese Networks initialized with BioWordVec embeddings still outperform the Siamese Networks initialized with embedding extracted from biomedical BERT model.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"82-87"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455661/pdf/nihms-1833238.pdf","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the conference. Association for Computational Linguistics. Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.insights-1.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Recent work uses a Siamese Network, initialized with BioWordVec embeddings (distributed word embeddings), for predicting synonymy among biomedical terms to automate a part of the UMLS (Unified Medical Language System) Metathesaurus construction process. We evaluate the use of contextualized word embeddings extracted from nine different biomedical BERT-based models for synonymy prediction in the UMLS by replacing BioWordVec embeddings with embeddings extracted from each biomedical BERT model using different feature extraction methods. Surprisingly, we find that Siamese Networks initialized with BioWordVec embeddings still outperform the Siamese Networks initialized with embedding extracted from biomedical BERT model.

使用Siamese网络评估UMLS元词典中生物医学词嵌入的词汇对齐。
最近的工作使用了一个Siamese网络,用BioWordVec嵌入(分布式词嵌入)初始化,用于预测生物医学术语之间的同义词,以自动完成UMLS(统一医学语言系统)元同义词库构建过程的一部分。通过使用不同的特征提取方法从每个生物医学BERT模型中提取的嵌入替换BioWordVec嵌入,我们评估了从9种不同的生物医学BERT模型中提取的上下文化词嵌入在UMLS同义词预测中的使用情况。令人惊讶的是,我们发现用BioWordVec嵌入初始化的Siamese网络仍然优于用生物医学BERT模型提取的嵌入初始化的Siamese网络。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信