Taking a Cue From the Human

K. Starr, Sabine Braun, Jaleh Delfani
{"title":"Taking a Cue From the Human","authors":"K. Starr, Sabine Braun, Jaleh Delfani","doi":"10.47476/jat.v3i2.2020.138","DOIUrl":null,"url":null,"abstract":"Human beings find the process of narrative sequencing in written texts and moving imagery a relatively simple task. Key to the success of this activity is establishing coherence by using critical cues to identify key characters, objects, actions and locations as they contribute to plot development. \nIn the drive to make audiovisual media more widely accessible (through audio description), and media archives more searchable (through content description), computer vision experts strive to automate video captioning in order to supplement human description activities. Existing models for automating video descriptions employ deep convolutional neural networks for encoding visual material and feature extraction (Krizhevsky, Sutskever, & Hinton, 2012; Szegedy et al., 2015; He, Zhang, Ren, & Sun, 2016). Recurrent neural networks decode the visual encodings and supply a sentence that describes the moving images in a manner mimicking human performance. However, these descriptions are currently “blind” to narrative coherence. \nOur study examines the human approach to narrative sequencing and coherence creation using the MeMAD [Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy] film corpus involving five-hundred extracts chosen as stand-alone narrative arcs. We examine character recognition, object detection and temporal continuity as indicators of coherence, using linguistic analysis and qualitative assessments to inform the development of more narratively sophisticated computer models in the future.","PeriodicalId":203332,"journal":{"name":"Journal of Audiovisual Translation","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Audiovisual Translation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47476/jat.v3i2.2020.138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Human beings find the process of narrative sequencing in written texts and moving imagery a relatively simple task. Key to the success of this activity is establishing coherence by using critical cues to identify key characters, objects, actions and locations as they contribute to plot development. In the drive to make audiovisual media more widely accessible (through audio description), and media archives more searchable (through content description), computer vision experts strive to automate video captioning in order to supplement human description activities. Existing models for automating video descriptions employ deep convolutional neural networks for encoding visual material and feature extraction (Krizhevsky, Sutskever, & Hinton, 2012; Szegedy et al., 2015; He, Zhang, Ren, & Sun, 2016). Recurrent neural networks decode the visual encodings and supply a sentence that describes the moving images in a manner mimicking human performance. However, these descriptions are currently “blind” to narrative coherence. Our study examines the human approach to narrative sequencing and coherence creation using the MeMAD [Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy] film corpus involving five-hundred extracts chosen as stand-alone narrative arcs. We examine character recognition, object detection and temporal continuity as indicators of coherence, using linguistic analysis and qualitative assessments to inform the development of more narratively sophisticated computer models in the future.
从人类身上得到启示
人类发现书面文本和移动图像的叙事排序过程是一项相对简单的任务。这一活动成功的关键是通过使用关键线索来识别关键角色、对象、行动和地点,从而建立连贯性,因为它们有助于情节发展。为了使视听媒体更广泛地可访问(通过音频描述),媒体档案更易于搜索(通过内容描述),计算机视觉专家努力实现视频字幕的自动化,以补充人类的描述活动。现有的自动视频描述模型使用深度卷积神经网络来编码视觉材料和特征提取(Krizhevsky, Sutskever, & Hinton, 2012;Szegedy等,2015;何、张、任、孙,2016)。循环神经网络解码视觉编码,并以模仿人类行为的方式提供描述运动图像的句子。然而,这些描述目前是“盲目”的叙事连贯性。我们的研究使用MeMAD[管理视听数据的方法:将自动效率与人类准确性相结合]电影语料库考察了人类对叙事顺序和连贯性创造的方法,该语料库涉及500个被选为独立叙事弧的片段。我们将字符识别、对象检测和时间连续性作为连贯性的指标,使用语言分析和定性评估来为未来更复杂的叙事计算机模型的发展提供信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信