基于GAN嵌入和超调DAEBERT算法的医疗保健文档检索系统自动查询扩展

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2025-06-01 DOI:10.1016/j.datak.2025.102468

Deepak Vishwakarma , Suresh Kumar

{"title":"基于GAN嵌入和超调DAEBERT算法的医疗保健文档检索系统自动查询扩展","authors":"Deepak Vishwakarma , Suresh Kumar","doi":"10.1016/j.datak.2025.102468","DOIUrl":null,"url":null,"abstract":"<div><div>Query expansion is a useful technique for improving document retrieval systems' dependability and performance. Search engines frequently employ query expansion strategies to improve Information Retrieval (IR) performance and elucidate users' information requirements. Although there are several methods for automatically expanding queries, the list of documents that are returned can occasionally be lengthy and contain a lot of useless information, particularly when searching the Web. As the size of medical document grows, Automatic Query Expansion might struggle with efficiency and real-time application. Thus, Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) with automatic ranking based query expansion system is created for enhancing medical document retrieval system. Initially, the user's query from the medical corpus document was collected, and it was augmented using the Generative Adversarial Network (GAN) approach. Then augmented text is pre-processed to improve the original text's quality through tokenization, acronym expansion, stemming, stop word removal, hyperlink removal, and spell correction. After that, Keywords are extracted using the Proximity-based Keyword Extraction (PKE) technique from the pre-processed text. Afterwards, the words are converted into vector form by utilizing the Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) model. In DAEBERT, key parameters such as dropout rate and weight decay were optimally selected by using the Election Optimization Algorithm (EOA). After that, a ranking-based query expansion approach was employed to enhance the document retrieval system. The proposed method achieves an accuracy of 97.60 %, a Hit Rate of 98.30 %, a PPV of 93.40 %, an F1-Score of 95.79 %, and an NPV of 97.50 %. This approach improves the accuracy and relevance of document retrieval in healthcare, potentially leading to better patient care and enhanced clinical outcomes.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102468"},"PeriodicalIF":2.7000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic query expansion for enhancing document retrieval system in healthcare application using GAN based embedding and hyper-tuned DAEBERT algorithm\",\"authors\":\"Deepak Vishwakarma , Suresh Kumar\",\"doi\":\"10.1016/j.datak.2025.102468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Query expansion is a useful technique for improving document retrieval systems' dependability and performance. Search engines frequently employ query expansion strategies to improve Information Retrieval (IR) performance and elucidate users' information requirements. Although there are several methods for automatically expanding queries, the list of documents that are returned can occasionally be lengthy and contain a lot of useless information, particularly when searching the Web. As the size of medical document grows, Automatic Query Expansion might struggle with efficiency and real-time application. Thus, Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) with automatic ranking based query expansion system is created for enhancing medical document retrieval system. Initially, the user's query from the medical corpus document was collected, and it was augmented using the Generative Adversarial Network (GAN) approach. Then augmented text is pre-processed to improve the original text's quality through tokenization, acronym expansion, stemming, stop word removal, hyperlink removal, and spell correction. After that, Keywords are extracted using the Proximity-based Keyword Extraction (PKE) technique from the pre-processed text. Afterwards, the words are converted into vector form by utilizing the Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) model. In DAEBERT, key parameters such as dropout rate and weight decay were optimally selected by using the Election Optimization Algorithm (EOA). After that, a ranking-based query expansion approach was employed to enhance the document retrieval system. The proposed method achieves an accuracy of 97.60 %, a Hit Rate of 98.30 %, a PPV of 93.40 %, an F1-Score of 95.79 %, and an NPV of 97.50 %. This approach improves the accuracy and relevance of document retrieval in healthcare, potentially leading to better patient care and enhanced clinical outcomes.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"160 \",\"pages\":\"Article 102468\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X25000631\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000631","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

查询扩展是提高文档检索系统可靠性和性能的一种有效技术。搜索引擎经常采用查询扩展策略来提高信息检索（Information Retrieval， IR）性能，并阐明用户的信息需求。虽然有几种方法可以自动扩展查询，但是返回的文档列表有时会很长，并且包含很多无用的信息，特别是在搜索Web时。随着医疗文档大小的增长，自动查询扩展可能会在效率和实时性方面遇到困难。为此，为增强医学文献检索系统，提出了基于自动排序的超调双注意增强双向编码器表示（HT-DAEBERT）查询扩展系统。首先，从医学语料库文档中收集用户查询，并使用生成对抗网络（GAN）方法对其进行增强。然后对增强文本进行预处理，通过标记化、首字母缩略词扩展、词干提取、停止词删除、超链接删除和拼写纠正来提高原始文本的质量。然后，使用基于邻近度的关键字提取（PKE）技术从预处理文本中提取关键字。然后，利用变形金刚的超调谐双注意增强双向编码器表示（HT-DAEBERT）模型将单词转换为向量形式。在DAEBERT中，采用选举优化算法（EOA）对辍学率和权值衰减等关键参数进行优化选择。然后，采用基于排序的查询扩展方法对文档检索系统进行增强。该方法的准确率为97.60%，命中率为98.30%，PPV为93.40%，F1-Score为95.79%，NPV为97.50%。这种方法提高了医疗保健中文档检索的准确性和相关性，可能会带来更好的患者护理和增强的临床结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic query expansion for enhancing document retrieval system in healthcare application using GAN based embedding and hyper-tuned DAEBERT algorithm

Query expansion is a useful technique for improving document retrieval systems' dependability and performance. Search engines frequently employ query expansion strategies to improve Information Retrieval (IR) performance and elucidate users' information requirements. Although there are several methods for automatically expanding queries, the list of documents that are returned can occasionally be lengthy and contain a lot of useless information, particularly when searching the Web. As the size of medical document grows, Automatic Query Expansion might struggle with efficiency and real-time application. Thus, Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) with automatic ranking based query expansion system is created for enhancing medical document retrieval system. Initially, the user's query from the medical corpus document was collected, and it was augmented using the Generative Adversarial Network (GAN) approach. Then augmented text is pre-processed to improve the original text's quality through tokenization, acronym expansion, stemming, stop word removal, hyperlink removal, and spell correction. After that, Keywords are extracted using the Proximity-based Keyword Extraction (PKE) technique from the pre-processed text. Afterwards, the words are converted into vector form by utilizing the Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) model. In DAEBERT, key parameters such as dropout rate and weight decay were optimally selected by using the Election Optimization Algorithm (EOA). After that, a ranking-based query expansion approach was employed to enhance the document retrieval system. The proposed method achieves an accuracy of 97.60 %, a Hit Rate of 98.30 %, a PPV of 93.40 %, an F1-Score of 95.79 %, and an NPV of 97.50 %. This approach improves the accuracy and relevance of document retrieval in healthcare, potentially leading to better patient care and enhanced clinical outcomes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.