{"title":"Optimizing medical visual question answering: Evaluating the impact of enhanced images, augmented training data, and model selection","authors":"Ali Jaber Almalki","doi":"10.1002/itl2.588","DOIUrl":null,"url":null,"abstract":"<p>Visual question answering (VQA) has an interesting application in clinical decision support and enables clinicians to extract information from medical images through natural language queries. However, the limited nature of the datasets makes it particularly difficult to develop effective VQA models for the medical profession. The aim of this study was to overcome these obstacles by formally testing methods for data enhancement and model optimization. Specifically, we merged two medical VQA datasets, applied image preprocessing techniques, examined several state-of-the-art model architectures, and extensively trained the best-performing model on the enhanced data. The results showed that training the VGG16-LSTM model on sharper images than the merged dataset resulted in a significant performance improvement due to extending the training time to 200, with F1 scores of the training set 0.9674.</p>","PeriodicalId":100725,"journal":{"name":"Internet Technology Letters","volume":"8 2","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/itl2.588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Visual question answering (VQA) has an interesting application in clinical decision support and enables clinicians to extract information from medical images through natural language queries. However, the limited nature of the datasets makes it particularly difficult to develop effective VQA models for the medical profession. The aim of this study was to overcome these obstacles by formally testing methods for data enhancement and model optimization. Specifically, we merged two medical VQA datasets, applied image preprocessing techniques, examined several state-of-the-art model architectures, and extensively trained the best-performing model on the enhanced data. The results showed that training the VGG16-LSTM model on sharper images than the merged dataset resulted in a significant performance improvement due to extending the training time to 200, with F1 scores of the training set 0.9674.