Query Disambiguation to Enhance Biomedical Information Retrieval Based on Neural Networks

Wided Selmi, Hager Kammoun, Ikram Amous
{"title":"Query Disambiguation to Enhance Biomedical Information Retrieval Based on Neural Networks","authors":"Wided Selmi, Hager Kammoun, Ikram Amous","doi":"10.1145/3508230.3508253","DOIUrl":null,"url":null,"abstract":"Information Retrieval Systems (IRS) use a query to find the relevant documents. Often the query term can have more than one sense; this is known as the ambiguity problem. This problem is a cause of poor performance in IRS. For this purpose, Word Sense Disambiguation (WSD) specifically deals with choosing the right sense of an ambiguous term, among a set of given candidate senses, according to its context (surrounding text). Obtaining all candidate senses is therefore a challenge for WSD. Word Sense Induction (WSI) is a task that automatically induces the different senses of a target word in different contexts. In this work, we propose a biomedical query disambiguation method. In this method, WSI use K-means algorithm to cluster the different contexts of ambiguous query term (MeSH descriptor) in order to induce the different senses. The different contexts are the sentences extracted from PubMed containing the target MeSH descriptor. To represent sentences as vectors, we propose to use a contextualized embeddings model “Biobert”. Our method is derived from the intuitive idea that the correct sense in the one having the high similarity among the candidate senses of an ambiguous term with its context. The conducted experiments on OHSUMED test collection yielded significant results.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Information Retrieval Systems (IRS) use a query to find the relevant documents. Often the query term can have more than one sense; this is known as the ambiguity problem. This problem is a cause of poor performance in IRS. For this purpose, Word Sense Disambiguation (WSD) specifically deals with choosing the right sense of an ambiguous term, among a set of given candidate senses, according to its context (surrounding text). Obtaining all candidate senses is therefore a challenge for WSD. Word Sense Induction (WSI) is a task that automatically induces the different senses of a target word in different contexts. In this work, we propose a biomedical query disambiguation method. In this method, WSI use K-means algorithm to cluster the different contexts of ambiguous query term (MeSH descriptor) in order to induce the different senses. The different contexts are the sentences extracted from PubMed containing the target MeSH descriptor. To represent sentences as vectors, we propose to use a contextualized embeddings model “Biobert”. Our method is derived from the intuitive idea that the correct sense in the one having the high similarity among the candidate senses of an ambiguous term with its context. The conducted experiments on OHSUMED test collection yielded significant results.
基于神经网络的查询消歧增强生物医学信息检索
信息检索系统(IRS)使用查询来查找相关文档。通常,查询词可以有不止一种含义;这就是所谓的歧义问题。这个问题是导致IRS性能不佳的一个原因。为此,词义消歧(WSD)专门处理根据上下文(周围文本)在一组给定的候选意义中选择歧义术语的正确意义。因此,获取所有候选感官对水务署来说是一项挑战。词义归纳(WSI)是一种在不同语境中自动归纳目标词的不同意义的任务。在这项工作中,我们提出了一种生物医学查询消歧方法。该方法利用K-means算法对歧义查询词的不同上下文(MeSH描述符)进行聚类,从而归纳出不同的语义。不同的上下文是从PubMed中提取的包含目标MeSH描述符的句子。为了将句子表示为向量,我们建议使用上下文化嵌入模型“Biobert”。我们的方法来源于一个直观的想法,即歧义术语的候选意义与其上下文具有高相似性的意义才是正确的意义。在OHSUMED测试集上进行的实验取得了显著的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信