{"title":"基于GAN嵌入和超调DAEBERT算法的医疗保健文档检索系统自动查询扩展","authors":"Deepak Vishwakarma , Suresh Kumar","doi":"10.1016/j.datak.2025.102468","DOIUrl":null,"url":null,"abstract":"<div><div>Query expansion is a useful technique for improving document retrieval systems' dependability and performance. Search engines frequently employ query expansion strategies to improve Information Retrieval (IR) performance and elucidate users' information requirements. Although there are several methods for automatically expanding queries, the list of documents that are returned can occasionally be lengthy and contain a lot of useless information, particularly when searching the Web. As the size of medical document grows, Automatic Query Expansion might struggle with efficiency and real-time application. Thus, Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) with automatic ranking based query expansion system is created for enhancing medical document retrieval system. Initially, the user's query from the medical corpus document was collected, and it was augmented using the Generative Adversarial Network (GAN) approach. Then augmented text is pre-processed to improve the original text's quality through tokenization, acronym expansion, stemming, stop word removal, hyperlink removal, and spell correction. After that, Keywords are extracted using the Proximity-based Keyword Extraction (PKE) technique from the pre-processed text. Afterwards, the words are converted into vector form by utilizing the Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) model. In DAEBERT, key parameters such as dropout rate and weight decay were optimally selected by using the Election Optimization Algorithm (EOA). After that, a ranking-based query expansion approach was employed to enhance the document retrieval system. The proposed method achieves an accuracy of 97.60 %, a Hit Rate of 98.30 %, a PPV of 93.40 %, an F1-Score of 95.79 %, and an NPV of 97.50 %. This approach improves the accuracy and relevance of document retrieval in healthcare, potentially leading to better patient care and enhanced clinical outcomes.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102468"},"PeriodicalIF":2.7000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic query expansion for enhancing document retrieval system in healthcare application using GAN based embedding and hyper-tuned DAEBERT algorithm\",\"authors\":\"Deepak Vishwakarma , Suresh Kumar\",\"doi\":\"10.1016/j.datak.2025.102468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Query expansion is a useful technique for improving document retrieval systems' dependability and performance. Search engines frequently employ query expansion strategies to improve Information Retrieval (IR) performance and elucidate users' information requirements. Although there are several methods for automatically expanding queries, the list of documents that are returned can occasionally be lengthy and contain a lot of useless information, particularly when searching the Web. As the size of medical document grows, Automatic Query Expansion might struggle with efficiency and real-time application. Thus, Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) with automatic ranking based query expansion system is created for enhancing medical document retrieval system. Initially, the user's query from the medical corpus document was collected, and it was augmented using the Generative Adversarial Network (GAN) approach. Then augmented text is pre-processed to improve the original text's quality through tokenization, acronym expansion, stemming, stop word removal, hyperlink removal, and spell correction. After that, Keywords are extracted using the Proximity-based Keyword Extraction (PKE) technique from the pre-processed text. Afterwards, the words are converted into vector form by utilizing the Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) model. In DAEBERT, key parameters such as dropout rate and weight decay were optimally selected by using the Election Optimization Algorithm (EOA). After that, a ranking-based query expansion approach was employed to enhance the document retrieval system. The proposed method achieves an accuracy of 97.60 %, a Hit Rate of 98.30 %, a PPV of 93.40 %, an F1-Score of 95.79 %, and an NPV of 97.50 %. This approach improves the accuracy and relevance of document retrieval in healthcare, potentially leading to better patient care and enhanced clinical outcomes.</div></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"160 \",\"pages\":\"Article 102468\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X25000631\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000631","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Automatic query expansion for enhancing document retrieval system in healthcare application using GAN based embedding and hyper-tuned DAEBERT algorithm
Query expansion is a useful technique for improving document retrieval systems' dependability and performance. Search engines frequently employ query expansion strategies to improve Information Retrieval (IR) performance and elucidate users' information requirements. Although there are several methods for automatically expanding queries, the list of documents that are returned can occasionally be lengthy and contain a lot of useless information, particularly when searching the Web. As the size of medical document grows, Automatic Query Expansion might struggle with efficiency and real-time application. Thus, Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) with automatic ranking based query expansion system is created for enhancing medical document retrieval system. Initially, the user's query from the medical corpus document was collected, and it was augmented using the Generative Adversarial Network (GAN) approach. Then augmented text is pre-processed to improve the original text's quality through tokenization, acronym expansion, stemming, stop word removal, hyperlink removal, and spell correction. After that, Keywords are extracted using the Proximity-based Keyword Extraction (PKE) technique from the pre-processed text. Afterwards, the words are converted into vector form by utilizing the Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) model. In DAEBERT, key parameters such as dropout rate and weight decay were optimally selected by using the Election Optimization Algorithm (EOA). After that, a ranking-based query expansion approach was employed to enhance the document retrieval system. The proposed method achieves an accuracy of 97.60 %, a Hit Rate of 98.30 %, a PPV of 93.40 %, an F1-Score of 95.79 %, and an NPV of 97.50 %. This approach improves the accuracy and relevance of document retrieval in healthcare, potentially leading to better patient care and enhanced clinical outcomes.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.