“Caption” as a Coherence Relation: Evidence and Implications

Proceedings of the Second Workshop on Shortcomings in Vision and Language Pub Date : 1900-01-01 DOI:10.18653/v1/W19-1806

Malihe Alikhani, Matthew Stone

引用次数: 23

Abstract

We study verbs in image–text corpora, contrasting caption corpora, where texts are explicitly written to characterize image content, with depiction corpora, where texts and images may stand in more general relations. Captions show a distinctively limited distribution of verbs, with strong preferences for specific tense, aspect, lexical aspect, and semantic field. These limitations, which appear in data elicited by a range of methods, restrict the utility of caption corpora to inform image retrieval, multimodal document generation, and perceptually-grounded semantic models. We suggest that these limitations reflect the discourse constraints in play when subjects write texts to accompany imagery, so we argue that future development of image–text corpora should work to increase the diversity of event descriptions, while looking explicitly at the different ways text and imagery can be coherently related.

查看原文本刊更多论文

“标题”作为一种连贯关系:证据与启示

我们研究了图像-文本语料库中的动词，对比了标题语料库和描述语料库，标题语料库中文本被明确地写出来以表征图像内容，而描述语料库中文本和图像可能处于更一般的关系中。标题显示出明显有限的动词分布，对特定的时态、方面、词汇方面和语义领域有强烈的偏好。这些限制出现在一系列方法得出的数据中，限制了标题语料库在为图像检索、多模态文档生成和基于感知的语义模型提供信息方面的效用。我们认为，这些限制反映了当受试者写文本伴随图像时的话语约束，因此我们认为，图像-文本语料库的未来发展应该努力增加事件描述的多样性，同时明确地关注文本和图像连贯相关的不同方式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Second Workshop on Shortcomings in Vision and Language

自引率

0.00%

发文量