Weakly Supervised Visual Question Answer Generation

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI:10.1109/CVPRW59228.2023.00591

Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty

{"title":"Weakly Supervised Visual Question Answer Generation","authors":"Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty","doi":"10.1109/CVPRW59228.2023.00591","DOIUrl":null,"url":null,"abstract":"Growing interest in conversational agents promote two-way human-computer communications involving asking and answering visual questions have become an active area of research in AI. Thus, generation of visual question-answer pair(s) becomes an important and challenging task. To address this issue, we propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption. Most of the prior works are supervised and depend on the annotated question-answer datasets. In our work, we present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions. The proposed method initially extracts list of answer words, then does nearest question generation that uses the caption and answer word to generate synthetic question. Next, the relevant question generator converts the nearest question to relevant language question by dependency parsing and in-order tree traversal, finally, fine-tune a ViLBERT model with the question-answer pair(s) generated at end. We perform an exhaustive experimental analysis on VQA dataset and see that our model significantly outperform SOTA methods on BLEU scores. We also show the results wrt baseline models and ablation study.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW59228.2023.00591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Growing interest in conversational agents promote two-way human-computer communications involving asking and answering visual questions have become an active area of research in AI. Thus, generation of visual question-answer pair(s) becomes an important and challenging task. To address this issue, we propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption. Most of the prior works are supervised and depend on the annotated question-answer datasets. In our work, we present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions. The proposed method initially extracts list of answer words, then does nearest question generation that uses the caption and answer word to generate synthetic question. Next, the relevant question generator converts the nearest question to relevant language question by dependency parsing and in-order tree traversal, finally, fine-tune a ViLBERT model with the question-answer pair(s) generated at end. We perform an exhaustive experimental analysis on VQA dataset and see that our model significantly outperform SOTA methods on BLEU scores. We also show the results wrt baseline models and ablation study.

查看原文本刊更多论文

弱监督视觉问答生成

对对话代理日益增长的兴趣促进了双向人机通信，包括询问和回答视觉问题，这已经成为人工智能研究的一个活跃领域。因此，视觉问答对的生成成为一项重要而富有挑战性的任务。为了解决这个问题，我们提出了一种弱监督的视觉问答生成方法，该方法为给定的输入图像和相关的标题生成相关的问答对。大多数先前的工作都是有监督的，并且依赖于注释的问答数据集。在我们的工作中，我们提出了一种弱监督的方法，从视觉信息和字幕中程序地综合生成问答对。该方法首先提取答案词列表，然后利用标题和答案词生成合成问题。接下来，相关问题生成器通过依赖解析和有序树遍历将最近的问题转换为相关语言问题，最后，使用最后生成的问答对对ViLBERT模型进行微调。我们对VQA数据集进行了详尽的实验分析，发现我们的模型在BLEU分数上明显优于SOTA方法。我们还展示了基线模型和消融研究的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量