Towards a computational model of Acoustic Packaging

2009 IEEE 8th International Conference on Development and Learning Pub Date : 2009-06-05 DOI:10.1109/DEVLRN.2009.5175523

Lars Schillingmann, B. Wrede, K. Rohlfing

{"title":"Towards a computational model of Acoustic Packaging","authors":"Lars Schillingmann, B. Wrede, K. Rohlfing","doi":"10.1109/DEVLRN.2009.5175523","DOIUrl":null,"url":null,"abstract":"In order to learn and interact with humans, robots need understand actions and make use of language in social interactions. The use of language for the learning of actions has been emphasized by Hirsh-Pasek & Golinkoff introducing the idea of Acoustic Packaging [1]. Accordingly, it has been suggested that acoustic information, typically in the form of narration, overlaps with action sequences and provides infants with a bottom-up guide to attend to relevant events and to find structure within them. Following the promising results achieved by Brand & Tapscott for infants who packaged sequences together when acoustic narration was provided, in this paper, we make the first approach towards a computational model of the multimodal interplay of action and language in tutoring situations. For our purpose, we understand events as temporal intervals, which have to be segmented in both the visual and the acoustic signal in order to perform Acoustic Packaging. For the visual modality, we looked at the amount of motion over time via a motion history image based approach. The visual signal is segmented by detecting local minima in the amount of motion. For the acoustic modality, we used a phoneme recognizer, which currently segments the acoustic signal into speech and non-speech intervals. Our Acoustic Packaging algorithm merges the segments from both modalities based on temporal synchrony. First evaluation results show that Acoustic Packaging can provide a meaningful segmentation of tutoring behavior.","PeriodicalId":192225,"journal":{"name":"2009 IEEE 8th International Conference on Development and Learning","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE 8th International Conference on Development and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2009.5175523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

In order to learn and interact with humans, robots need understand actions and make use of language in social interactions. The use of language for the learning of actions has been emphasized by Hirsh-Pasek & Golinkoff introducing the idea of Acoustic Packaging [1]. Accordingly, it has been suggested that acoustic information, typically in the form of narration, overlaps with action sequences and provides infants with a bottom-up guide to attend to relevant events and to find structure within them. Following the promising results achieved by Brand & Tapscott for infants who packaged sequences together when acoustic narration was provided, in this paper, we make the first approach towards a computational model of the multimodal interplay of action and language in tutoring situations. For our purpose, we understand events as temporal intervals, which have to be segmented in both the visual and the acoustic signal in order to perform Acoustic Packaging. For the visual modality, we looked at the amount of motion over time via a motion history image based approach. The visual signal is segmented by detecting local minima in the amount of motion. For the acoustic modality, we used a phoneme recognizer, which currently segments the acoustic signal into speech and non-speech intervals. Our Acoustic Packaging algorithm merges the segments from both modalities based on temporal synchrony. First evaluation results show that Acoustic Packaging can provide a meaningful segmentation of tutoring behavior.

查看原文本刊更多论文

声学封装的计算模型研究

为了学习和与人类互动，机器人需要在社会互动中理解行为和使用语言。Hirsh-Pasek & Golinkoff提出了声学包装的概念[1]，强调了语言在动作学习中的使用。因此，有人建议，声音信息，通常以叙述的形式，与动作序列重叠，并为婴儿提供一个自下而上的指导，以参加相关事件，并在其中找到结构。Brand和Tapscott在提供声音叙述时将序列打包在一起的婴儿中取得了令人鼓舞的结果，在本文中，我们首次提出了在辅导情况下动作和语言多模态相互作用的计算模型。为了我们的目的，我们将事件理解为时间间隔，为了执行声学包装，必须在视觉和声学信号中对其进行分割。对于视觉模式，我们通过基于运动历史图像的方法观察了随着时间的运动量。通过检测运动量的局部最小值来分割视觉信号。对于声学模态，我们使用了一个音素识别器，它目前将声学信号分为语音和非语音区间。我们的声学封装算法在时间同步的基础上合并了两种模式的片段。首先，评价结果表明，声包装可以对教学行为进行有意义的分割。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE 8th International Conference on Development and Learning

自引率

0.00%

发文量