{"title":"Query Disambiguation to Enhance Biomedical Information Retrieval Based on Neural Networks","authors":"Wided Selmi, Hager Kammoun, Ikram Amous","doi":"10.1145/3508230.3508253","DOIUrl":null,"url":null,"abstract":"Information Retrieval Systems (IRS) use a query to find the relevant documents. Often the query term can have more than one sense; this is known as the ambiguity problem. This problem is a cause of poor performance in IRS. For this purpose, Word Sense Disambiguation (WSD) specifically deals with choosing the right sense of an ambiguous term, among a set of given candidate senses, according to its context (surrounding text). Obtaining all candidate senses is therefore a challenge for WSD. Word Sense Induction (WSI) is a task that automatically induces the different senses of a target word in different contexts. In this work, we propose a biomedical query disambiguation method. In this method, WSI use K-means algorithm to cluster the different contexts of ambiguous query term (MeSH descriptor) in order to induce the different senses. The different contexts are the sentences extracted from PubMed containing the target MeSH descriptor. To represent sentences as vectors, we propose to use a contextualized embeddings model “Biobert”. Our method is derived from the intuitive idea that the correct sense in the one having the high similarity among the candidate senses of an ambiguous term with its context. The conducted experiments on OHSUMED test collection yielded significant results.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Information Retrieval Systems (IRS) use a query to find the relevant documents. Often the query term can have more than one sense; this is known as the ambiguity problem. This problem is a cause of poor performance in IRS. For this purpose, Word Sense Disambiguation (WSD) specifically deals with choosing the right sense of an ambiguous term, among a set of given candidate senses, according to its context (surrounding text). Obtaining all candidate senses is therefore a challenge for WSD. Word Sense Induction (WSI) is a task that automatically induces the different senses of a target word in different contexts. In this work, we propose a biomedical query disambiguation method. In this method, WSI use K-means algorithm to cluster the different contexts of ambiguous query term (MeSH descriptor) in order to induce the different senses. The different contexts are the sentences extracted from PubMed containing the target MeSH descriptor. To represent sentences as vectors, we propose to use a contextualized embeddings model “Biobert”. Our method is derived from the intuitive idea that the correct sense in the one having the high similarity among the candidate senses of an ambiguous term with its context. The conducted experiments on OHSUMED test collection yielded significant results.