{"title":"Multi-head divide-and-conquer residual-attention mechanism with pointer network for multimodal question summarization in healthcare","authors":"S. Priskilla Manonmani, S. Malathi","doi":"10.1016/j.ipm.2025.104348","DOIUrl":null,"url":null,"abstract":"<div><div>In contemporary medicine, summaries of medical questions are vital for effective and precise patient care. Current techniques handle only text-based summarization without considering the merit of incorporating visual information. To meet this, this research presents a multimodal summarization system that combines textual queries with medical images to support the extraction of meaningful details. The proposed system has three phases. In the first step, a gradual fusion decoder bidirectional encoder representation from transformers with vision transformers is utilized to produce fine-grained feature maps and diagnose diseases. The Multi-Agent Contextualized Diffusion Model (MACDM) is then utilized to contextualize knowledge using cross-modal information. Lastly, a Multi-head Divide-and-Conquer Residual-Attention mechanism with Pointer Network (MDCRAPN) is utilized to provide brief and relevant summaries. Furthermore, the hermit crab shell exchange algorithm is integrated to optimize hyperparameters for improved performance. The experimental results indicate that this proposed approach performs better than existing approaches with a recall-oriented understudy for gisting evaluation-1 score of 48.11 on the Multimodal Medical Question Summarization (MMQS) dataset. This approach significantly enhances the identification and summarization of medical disorders, demonstrating the potential to enhance healthcare communication and decision-making.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104348"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002894","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In contemporary medicine, summaries of medical questions are vital for effective and precise patient care. Current techniques handle only text-based summarization without considering the merit of incorporating visual information. To meet this, this research presents a multimodal summarization system that combines textual queries with medical images to support the extraction of meaningful details. The proposed system has three phases. In the first step, a gradual fusion decoder bidirectional encoder representation from transformers with vision transformers is utilized to produce fine-grained feature maps and diagnose diseases. The Multi-Agent Contextualized Diffusion Model (MACDM) is then utilized to contextualize knowledge using cross-modal information. Lastly, a Multi-head Divide-and-Conquer Residual-Attention mechanism with Pointer Network (MDCRAPN) is utilized to provide brief and relevant summaries. Furthermore, the hermit crab shell exchange algorithm is integrated to optimize hyperparameters for improved performance. The experimental results indicate that this proposed approach performs better than existing approaches with a recall-oriented understudy for gisting evaluation-1 score of 48.11 on the Multimodal Medical Question Summarization (MMQS) dataset. This approach significantly enhances the identification and summarization of medical disorders, demonstrating the potential to enhance healthcare communication and decision-making.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.