{"title":"ExVQA: a novel stacked attention networks with extended long short-term memory model for visual question answering","authors":"Bui Thanh Hung, Ho Vo Hoang Duy","doi":"10.1016/j.compeleceng.2025.110439","DOIUrl":null,"url":null,"abstract":"<div><div>Visual Question Answering (VQA) has garnered significant attention in recent years due to its potential for broad applications across fields such as medicine, education, and entertainment. However, existing VQA methods still face several limitations, including challenges in handling abstract and complex questions, poor generalization, lack of explainability, and susceptibility to noise and bias. In this study, we propose a novel ExVQA model that leverages Stacked Attention Networks (SANs) and Extended Long Short-Term Memory (xLSTM) for Visual Question Answering. Image features are extracted using Sigmoid loss for Language-Image Pre-training (SigLIP), while question features are represented using the Autoregressive Transformer Decoder model (GPT-Neo) and Extended Long Short-Term Memory networks to facilitate the answer generation process. By utilizing the strengths of SANs and xLSTM, our approach aims to overcome the limitations of previous models and enhance the performance and reliability of VQA systems. Evaluation results on three datasets: PathVQA, VQA-Med 2019 and GQA show that our proposed ExVQA model achieves better performance than existing methods, demonstrating great application potential in the fields of medicine, education and entertainment.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"126 ","pages":"Article 110439"},"PeriodicalIF":4.0000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625003829","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Visual Question Answering (VQA) has garnered significant attention in recent years due to its potential for broad applications across fields such as medicine, education, and entertainment. However, existing VQA methods still face several limitations, including challenges in handling abstract and complex questions, poor generalization, lack of explainability, and susceptibility to noise and bias. In this study, we propose a novel ExVQA model that leverages Stacked Attention Networks (SANs) and Extended Long Short-Term Memory (xLSTM) for Visual Question Answering. Image features are extracted using Sigmoid loss for Language-Image Pre-training (SigLIP), while question features are represented using the Autoregressive Transformer Decoder model (GPT-Neo) and Extended Long Short-Term Memory networks to facilitate the answer generation process. By utilizing the strengths of SANs and xLSTM, our approach aims to overcome the limitations of previous models and enhance the performance and reliability of VQA systems. Evaluation results on three datasets: PathVQA, VQA-Med 2019 and GQA show that our proposed ExVQA model achieves better performance than existing methods, demonstrating great application potential in the fields of medicine, education and entertainment.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.