比较人类和自动化的视觉叙事方法

Sabine Braun, K. Starr, Jorma T. Laaksonen
{"title":"比较人类和自动化的视觉叙事方法","authors":"Sabine Braun, K. Starr, Jorma T. Laaksonen","doi":"10.4324/9781003052968-9","DOIUrl":null,"url":null,"abstract":"This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. While computer vision communities have intensified research into the automatic generation of video descriptions (Bernardi et al. , 2016), the automation of still image captioning remains a challenge in terms of accuracy (Husain & Bober, 2016). Moving images pose additional challenges linked to temporality, including co-referencing (Rohrbach et al. , 2017) and other features of narrative continuity (Huang et al. , 2016). Machine-generated descriptions are currently less sophisticated than their human equivalents, and frequently incoherent or incorrect. By contrast, human descriptions are more elaborate and reliable but are expensive to produce. Nevertheless, they offer information about visual and auditory elements in audiovisual content that can be exploited for research into machine training. Based on our research conducted in the EU-funded MeMAD project, this chapter outlines a methodological approach for a systematic comparison of human and machine-generated video descriptions, drawing on corpus-based and discourse-based approaches, with a view to identifying key characteristics and patterns in both types of description, and exploiting human knowledge about video description for machine training.","PeriodicalId":263682,"journal":{"name":"Innovation in Audio Description Research","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Comparing human and automated approaches to visual storytelling\",\"authors\":\"Sabine Braun, K. Starr, Jorma T. Laaksonen\",\"doi\":\"10.4324/9781003052968-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. While computer vision communities have intensified research into the automatic generation of video descriptions (Bernardi et al. , 2016), the automation of still image captioning remains a challenge in terms of accuracy (Husain & Bober, 2016). Moving images pose additional challenges linked to temporality, including co-referencing (Rohrbach et al. , 2017) and other features of narrative continuity (Huang et al. , 2016). Machine-generated descriptions are currently less sophisticated than their human equivalents, and frequently incoherent or incorrect. By contrast, human descriptions are more elaborate and reliable but are expensive to produce. Nevertheless, they offer information about visual and auditory elements in audiovisual content that can be exploited for research into machine training. Based on our research conducted in the EU-funded MeMAD project, this chapter outlines a methodological approach for a systematic comparison of human and machine-generated video descriptions, drawing on corpus-based and discourse-based approaches, with a view to identifying key characteristics and patterns in both types of description, and exploiting human knowledge about video description for machine training.\",\"PeriodicalId\":263682,\"journal\":{\"name\":\"Innovation in Audio Description Research\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Innovation in Audio Description Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4324/9781003052968-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Innovation in Audio Description Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4324/9781003052968-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

本章主要关注最近对描述视听内容的自动化方法的兴趣激增,无论是用于图像搜索和检索,视觉叙事还是响应监管框架变化后对音频描述的不断增长的需求。虽然计算机视觉社区已经加强了对视频描述自动生成的研究(Bernardi等人,2016),但静态图像字幕的自动化在准确性方面仍然是一个挑战(Husain & Bober, 2016)。运动图像带来了与时间性相关的额外挑战,包括共同引用(Rohrbach et al., 2017)和其他叙事连续性特征(Huang et al., 2016)。目前,机器生成的描述不如人类生成的描述复杂,而且经常不连贯或不正确。相比之下,人类的描述更为详尽和可靠,但制作成本高昂。然而,它们提供了关于视听内容中视觉和听觉元素的信息,这些信息可以用于机器训练的研究。基于我们在欧盟资助的MeMAD项目中进行的研究,本章概述了一种方法方法,用于系统比较人类和机器生成的视频描述,利用基于语料库和基于话语的方法,以确定两种类型描述的关键特征和模式,并利用人类关于视频描述的知识进行机器训练。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing human and automated approaches to visual storytelling
This chapter focuses on the recent surge of interest in automating methods for describing audiovisual content whether for image search and retrieval, visual storytelling or in response to the rising demand for audio description following changes to regulatory frameworks. While computer vision communities have intensified research into the automatic generation of video descriptions (Bernardi et al. , 2016), the automation of still image captioning remains a challenge in terms of accuracy (Husain & Bober, 2016). Moving images pose additional challenges linked to temporality, including co-referencing (Rohrbach et al. , 2017) and other features of narrative continuity (Huang et al. , 2016). Machine-generated descriptions are currently less sophisticated than their human equivalents, and frequently incoherent or incorrect. By contrast, human descriptions are more elaborate and reliable but are expensive to produce. Nevertheless, they offer information about visual and auditory elements in audiovisual content that can be exploited for research into machine training. Based on our research conducted in the EU-funded MeMAD project, this chapter outlines a methodological approach for a systematic comparison of human and machine-generated video descriptions, drawing on corpus-based and discourse-based approaches, with a view to identifying key characteristics and patterns in both types of description, and exploiting human knowledge about video description for machine training.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信