结合对比学习和知识图嵌入开发意大利语医学词嵌入

Denys Amore Bondarenko, Roger Ferrod, Luigi Di Caro
{"title":"结合对比学习和知识图嵌入开发意大利语医学词嵌入","authors":"Denys Amore Bondarenko, Roger Ferrod, Luigi Di Caro","doi":"10.48550/arXiv.2211.05035","DOIUrl":null,"url":null,"abstract":"Word embeddings play a significant role in today's Natural Language Processing tasks and applications. While pre-trained models may be directly employed and integrated into existing pipelines, they are often fine-tuned to better fit with specific languages or domains. In this paper, we attempt to improve available embeddings in the uncovered niche of the Italian medical domain through the combination of Contrastive Learning (CL) and Knowledge Graph Embedding (KGE). The main objective is to improve the accuracy of semantic similarity between medical terms, which is also used as an evaluation task. Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution by combining preexisting CL methods (multi-similarity loss, contextualization, dynamic sampling) and the integration of KGEs, creating a new variant of the loss. Although without having outperformed the state-of-the-art, represented by multilingual models, the obtained results are encouraging, providing a significant leap in performance compared to the starting model, while using a significantly lower amount of data.","PeriodicalId":293643,"journal":{"name":"International Conference of the Italian Association for Artificial Intelligence","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combining Contrastive Learning and Knowledge Graph Embeddings to develop medical word embeddings for the Italian language\",\"authors\":\"Denys Amore Bondarenko, Roger Ferrod, Luigi Di Caro\",\"doi\":\"10.48550/arXiv.2211.05035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word embeddings play a significant role in today's Natural Language Processing tasks and applications. While pre-trained models may be directly employed and integrated into existing pipelines, they are often fine-tuned to better fit with specific languages or domains. In this paper, we attempt to improve available embeddings in the uncovered niche of the Italian medical domain through the combination of Contrastive Learning (CL) and Knowledge Graph Embedding (KGE). The main objective is to improve the accuracy of semantic similarity between medical terms, which is also used as an evaluation task. Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution by combining preexisting CL methods (multi-similarity loss, contextualization, dynamic sampling) and the integration of KGEs, creating a new variant of the loss. Although without having outperformed the state-of-the-art, represented by multilingual models, the obtained results are encouraging, providing a significant leap in performance compared to the starting model, while using a significantly lower amount of data.\",\"PeriodicalId\":293643,\"journal\":{\"name\":\"International Conference of the Italian Association for Artificial Intelligence\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference of the Italian Association for Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2211.05035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference of the Italian Association for Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.05035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

词嵌入在当今的自然语言处理任务和应用中扮演着重要的角色。虽然可以直接使用预训练的模型并将其集成到现有的管道中,但它们通常会经过微调以更好地适应特定的语言或领域。在本文中,我们试图通过对比学习(CL)和知识图嵌入(KGE)的结合来改善意大利医学领域未覆盖的生态位的可用嵌入。主要目的是提高医学术语之间语义相似度的准确性,这也是一个评估任务。由于意大利语缺乏医学文本和受控词汇,我们通过结合已有的CL方法(多相似性丢失、上下文化、动态采样)和kge集成开发了一个特定的解决方案,创建了一个新的丢失变体。虽然没有超越多语言模型所代表的最先进技术,但获得的结果令人鼓舞,与开始模型相比,在性能上有了显着的飞跃,同时使用了显着减少的数据量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Combining Contrastive Learning and Knowledge Graph Embeddings to develop medical word embeddings for the Italian language
Word embeddings play a significant role in today's Natural Language Processing tasks and applications. While pre-trained models may be directly employed and integrated into existing pipelines, they are often fine-tuned to better fit with specific languages or domains. In this paper, we attempt to improve available embeddings in the uncovered niche of the Italian medical domain through the combination of Contrastive Learning (CL) and Knowledge Graph Embedding (KGE). The main objective is to improve the accuracy of semantic similarity between medical terms, which is also used as an evaluation task. Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution by combining preexisting CL methods (multi-similarity loss, contextualization, dynamic sampling) and the integration of KGEs, creating a new variant of the loss. Although without having outperformed the state-of-the-art, represented by multilingual models, the obtained results are encouraging, providing a significant leap in performance compared to the starting model, while using a significantly lower amount of data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信