iCap: Interactive Image Captioning with Predictive Text

Proceedings of the 2020 International Conference on Multimedia Retrieval Pub Date : 2020-01-31 DOI:10.1145/3372278.3390697

Zhengxiong Jia, Xirong Li

引用次数: 8

Abstract

In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose ABD-Cap, asynchronous bidirectional decoding for image caption completion. With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals.

查看原文本刊更多论文

iCap:带有预测文本的交互式图像字幕

本文研究了一个全新的课题——人在循环的交互式图像字幕。与自动图像字幕(给定的测试图像是推理阶段的唯一输入)不同，我们可以在交互场景中访问测试图像和一系列(不完整的)用户输入句子。我们将这个问题表述为视觉条件句补全(VCSC)。对于VCSC，我们提出了ABD-Cap，用于图像标题补全的异步双向解码。以ABD-Cap为核心模块，我们构建了iCap，这是一个基于web的交互式图像字幕系统，能够根据用户的实时输入预测新的文本。涵盖自动化评估和真实用户研究的大量实验表明了我们的建议的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 International Conference on Multimedia Retrieval

自引率

0.00%

发文量