{"title":"LEVERAGING NEURAL NETWORK PHRASE EMBEDDING MODEL FOR QUERY REFORMULATION IN AD-HOC BIOMEDICAL INFORMATION RETRIEVAL","authors":"Bhopale P. Bhopale, Ashish Tiwari","doi":"10.22452/mjcs.vol34no2.2","DOIUrl":null,"url":null,"abstract":"This study presents a spark enhanced neural network phrase embedding model to leverage query representation for relevant biomedical literature retrieval. Information retrieval for clinical decision support demands high precision. In recent years, word embeddings have been evolved as a solution to such requirements. It represents vocabulary words in low-dimensional vectors in the context of their similar words; however, it is inadequate to deal with semantic phrases or multi-word units. Learning vector embeddings for phrases by maintaining word meanings is a challenging task. This study proposes a scalable phrase embedding technique to embed multi-word units into vector representations using a state-of-the-art word embedding technique, keeping both word and phrase in the same vectors space. It will enhance the effectiveness and efficiency of query language models by expanding unseen query terms and phrases for the semantically associated query terms. Embedding vectors are evaluated via a query expansion technique for ad-hoc retrieval task over two benchmark corpora viz. TREC-CDS 2014 collection with 733,138 PubMed articles and OHSUMED corpus having 348,566 articles collected from a Medline database. The results show that the proposed technique has significantly outperformed other state-of-the-art retrieval techniques","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.vol34no2.2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 4
Abstract
This study presents a spark enhanced neural network phrase embedding model to leverage query representation for relevant biomedical literature retrieval. Information retrieval for clinical decision support demands high precision. In recent years, word embeddings have been evolved as a solution to such requirements. It represents vocabulary words in low-dimensional vectors in the context of their similar words; however, it is inadequate to deal with semantic phrases or multi-word units. Learning vector embeddings for phrases by maintaining word meanings is a challenging task. This study proposes a scalable phrase embedding technique to embed multi-word units into vector representations using a state-of-the-art word embedding technique, keeping both word and phrase in the same vectors space. It will enhance the effectiveness and efficiency of query language models by expanding unseen query terms and phrases for the semantically associated query terms. Embedding vectors are evaluated via a query expansion technique for ad-hoc retrieval task over two benchmark corpora viz. TREC-CDS 2014 collection with 733,138 PubMed articles and OHSUMED corpus having 348,566 articles collected from a Medline database. The results show that the proposed technique has significantly outperformed other state-of-the-art retrieval techniques
期刊介绍:
The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication. The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus