Nelson Ruwa, Qi-rong Mao, Liangjun Wang, Ming Dong
{"title":"Affective Visual Question Answering Network","authors":"Nelson Ruwa, Qi-rong Mao, Liangjun Wang, Ming Dong","doi":"10.1109/MIPR.2018.00038","DOIUrl":null,"url":null,"abstract":"Visual Question Answering (VQA) has recently attracted considerable attention from researchers in the trending field of deep learning. The need to improve VQA models by focusing on local regions of images, has resulted in the development of various attention models. This paper proposes the Affective Visual Question Answering Network (AVQAN), an attention model that combines the locality of the image features, the question and the mood detected from the specific region of the image to produce an affective answer using a preprocessed image dataset. The experimental results depict that AVQAN enriches the analysis and understanding of images by adding affective information to the answer, while still managing to maintain the accuracy levels within the range of recent ordinary VQA baseline models. The proposed model significantly contributes towards the development of rapidly improving emotion-aware machines that are becoming increasingly vital in everyday life.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR.2018.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Visual Question Answering (VQA) has recently attracted considerable attention from researchers in the trending field of deep learning. The need to improve VQA models by focusing on local regions of images, has resulted in the development of various attention models. This paper proposes the Affective Visual Question Answering Network (AVQAN), an attention model that combines the locality of the image features, the question and the mood detected from the specific region of the image to produce an affective answer using a preprocessed image dataset. The experimental results depict that AVQAN enriches the analysis and understanding of images by adding affective information to the answer, while still managing to maintain the accuracy levels within the range of recent ordinary VQA baseline models. The proposed model significantly contributes towards the development of rapidly improving emotion-aware machines that are becoming increasingly vital in everyday life.