LEVERAGING NEURAL NETWORK PHRASE EMBEDDING MODEL FOR QUERY REFORMULATION IN AD-HOC BIOMEDICAL INFORMATION RETRIEVAL

IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Bhopale P. Bhopale, Ashish Tiwari
{"title":"LEVERAGING NEURAL NETWORK PHRASE EMBEDDING MODEL FOR QUERY REFORMULATION IN AD-HOC BIOMEDICAL INFORMATION RETRIEVAL","authors":"Bhopale P. Bhopale, Ashish Tiwari","doi":"10.22452/mjcs.vol34no2.2","DOIUrl":null,"url":null,"abstract":"This study presents a spark enhanced neural network phrase embedding model to leverage query representation for relevant biomedical literature retrieval. Information retrieval for clinical decision support demands high precision. In recent years, word embeddings have been evolved as a solution to such requirements. It represents vocabulary words in low-dimensional vectors in the context of their similar words; however, it is inadequate to deal with semantic phrases or multi-word units. Learning vector embeddings for phrases by maintaining word meanings is a challenging task. This study proposes a scalable phrase embedding technique to embed multi-word units into vector representations using a state-of-the-art word embedding technique, keeping both word and phrase in the same vectors space. It will enhance the effectiveness and efficiency of query language models by expanding unseen query terms and phrases for the semantically associated query terms. Embedding vectors are evaluated via a query expansion technique for ad-hoc retrieval task over two benchmark corpora viz. TREC-CDS 2014 collection with 733,138 PubMed articles and OHSUMED corpus having 348,566 articles collected from a Medline database. The results show that the proposed technique has significantly outperformed other state-of-the-art retrieval techniques","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.vol34no2.2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 4

Abstract

This study presents a spark enhanced neural network phrase embedding model to leverage query representation for relevant biomedical literature retrieval. Information retrieval for clinical decision support demands high precision. In recent years, word embeddings have been evolved as a solution to such requirements. It represents vocabulary words in low-dimensional vectors in the context of their similar words; however, it is inadequate to deal with semantic phrases or multi-word units. Learning vector embeddings for phrases by maintaining word meanings is a challenging task. This study proposes a scalable phrase embedding technique to embed multi-word units into vector representations using a state-of-the-art word embedding technique, keeping both word and phrase in the same vectors space. It will enhance the effectiveness and efficiency of query language models by expanding unseen query terms and phrases for the semantically associated query terms. Embedding vectors are evaluated via a query expansion technique for ad-hoc retrieval task over two benchmark corpora viz. TREC-CDS 2014 collection with 733,138 PubMed articles and OHSUMED corpus having 348,566 articles collected from a Medline database. The results show that the proposed technique has significantly outperformed other state-of-the-art retrieval techniques
利用神经网络短语嵌入模型进行自组织生物医学信息检索中的查询重构
本研究提出了一种火花增强的神经网络短语嵌入模型,以利用查询表示进行相关生物医学文献检索。用于临床决策支持的信息检索要求高精度。近年来,单词嵌入已经发展成为此类需求的解决方案。它在相似词的上下文中以低维向量表示词汇;然而,它不足以处理语义短语或多词单元。通过维护单词含义来学习短语的向量嵌入是一项具有挑战性的任务。本研究提出了一种可扩展的短语嵌入技术,使用最先进的单词嵌入技术将多个单词单元嵌入到向量表示中,使单词和短语保持在相同的向量空间中。它将通过为语义关联的查询术语扩展看不见的查询术语和短语来提高查询语言模型的有效性和效率。嵌入向量通过查询扩展技术在两个基准语料库上进行评估,即TREC-CDS 2014收集的733138篇PubMed文章和OHSUMED语料库从Medline数据库收集的348566篇文章。结果表明,所提出的技术显著优于其他最先进的检索技术
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Malaysian Journal of Computer Science
Malaysian Journal of Computer Science COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, THEORY & METHODS
CiteScore
2.20
自引率
33.30%
发文量
35
审稿时长
7.5 months
期刊介绍: The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication.  The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信