使用深度学习的视觉问答

Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini
{"title":"使用深度学习的视觉问答","authors":"Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini","doi":"10.1109/i-PACT52855.2021.9696665","DOIUrl":null,"url":null,"abstract":"Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.","PeriodicalId":335956,"journal":{"name":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visual Question Answering Using Deep Learning\",\"authors\":\"Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini\",\"doi\":\"10.1109/i-PACT52855.2021.9696665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.\",\"PeriodicalId\":335956,\"journal\":{\"name\":\"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/i-PACT52855.2021.9696665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/i-PACT52855.2021.9696665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,视觉问答(VQA)挑战了自然语言处理和计算机视觉领域的巨大兴趣。VQA旨在建立一个智能系统来预测与图像相关的自然语言问题的答案。关于抽象或真实世界图像的问题诉诸于VQA系统;该系统使用自然语言处理(NLP)和计算机视觉来理解图像和问题,目的是用自然语言预测答案。影响VQA系统性能的主要问题是无法处理从用户那里获得的开放式问题。该系统采用图形用户界面(GUI),使用预训练的VGG - 16提取图像特征,并使用Golve嵌入和长短期记忆(LSTM)提取问题特征。利用点乘法将图像特征与问题进行融合,得到最终结果。将获得的结果通过softmax层来找到关于图像问题的前5个预测。所提出的系统已经用各种开放式问题进行了实验,以显示系统的鲁棒性。VQA在自动驾驶汽车和视障人士的指导等各种现实场景中得到了应用。视觉问题针对图像的不同部分,包括潜在的上下文和背景细节。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Visual Question Answering Using Deep Learning
Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信