Towards a computational model of Acoustic Packaging

Lars Schillingmann, B. Wrede, K. Rohlfing
{"title":"Towards a computational model of Acoustic Packaging","authors":"Lars Schillingmann, B. Wrede, K. Rohlfing","doi":"10.1109/DEVLRN.2009.5175523","DOIUrl":null,"url":null,"abstract":"In order to learn and interact with humans, robots need understand actions and make use of language in social interactions. The use of language for the learning of actions has been emphasized by Hirsh-Pasek & Golinkoff introducing the idea of Acoustic Packaging [1]. Accordingly, it has been suggested that acoustic information, typically in the form of narration, overlaps with action sequences and provides infants with a bottom-up guide to attend to relevant events and to find structure within them. Following the promising results achieved by Brand & Tapscott for infants who packaged sequences together when acoustic narration was provided, in this paper, we make the first approach towards a computational model of the multimodal interplay of action and language in tutoring situations. For our purpose, we understand events as temporal intervals, which have to be segmented in both the visual and the acoustic signal in order to perform Acoustic Packaging. For the visual modality, we looked at the amount of motion over time via a motion history image based approach. The visual signal is segmented by detecting local minima in the amount of motion. For the acoustic modality, we used a phoneme recognizer, which currently segments the acoustic signal into speech and non-speech intervals. Our Acoustic Packaging algorithm merges the segments from both modalities based on temporal synchrony. First evaluation results show that Acoustic Packaging can provide a meaningful segmentation of tutoring behavior.","PeriodicalId":192225,"journal":{"name":"2009 IEEE 8th International Conference on Development and Learning","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE 8th International Conference on Development and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2009.5175523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

In order to learn and interact with humans, robots need understand actions and make use of language in social interactions. The use of language for the learning of actions has been emphasized by Hirsh-Pasek & Golinkoff introducing the idea of Acoustic Packaging [1]. Accordingly, it has been suggested that acoustic information, typically in the form of narration, overlaps with action sequences and provides infants with a bottom-up guide to attend to relevant events and to find structure within them. Following the promising results achieved by Brand & Tapscott for infants who packaged sequences together when acoustic narration was provided, in this paper, we make the first approach towards a computational model of the multimodal interplay of action and language in tutoring situations. For our purpose, we understand events as temporal intervals, which have to be segmented in both the visual and the acoustic signal in order to perform Acoustic Packaging. For the visual modality, we looked at the amount of motion over time via a motion history image based approach. The visual signal is segmented by detecting local minima in the amount of motion. For the acoustic modality, we used a phoneme recognizer, which currently segments the acoustic signal into speech and non-speech intervals. Our Acoustic Packaging algorithm merges the segments from both modalities based on temporal synchrony. First evaluation results show that Acoustic Packaging can provide a meaningful segmentation of tutoring behavior.
声学封装的计算模型研究
为了学习和与人类互动,机器人需要在社会互动中理解行为和使用语言。Hirsh-Pasek & Golinkoff提出了声学包装的概念[1],强调了语言在动作学习中的使用。因此,有人建议,声音信息,通常以叙述的形式,与动作序列重叠,并为婴儿提供一个自下而上的指导,以参加相关事件,并在其中找到结构。Brand和Tapscott在提供声音叙述时将序列打包在一起的婴儿中取得了令人鼓舞的结果,在本文中,我们首次提出了在辅导情况下动作和语言多模态相互作用的计算模型。为了我们的目的,我们将事件理解为时间间隔,为了执行声学包装,必须在视觉和声学信号中对其进行分割。对于视觉模式,我们通过基于运动历史图像的方法观察了随着时间的运动量。通过检测运动量的局部最小值来分割视觉信号。对于声学模态,我们使用了一个音素识别器,它目前将声学信号分为语音和非语音区间。我们的声学封装算法在时间同步的基础上合并了两种模式的片段。首先,评价结果表明,声包装可以对教学行为进行有意义的分割。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信