Reaching beneath the tip of the iceberg: A guide to the Freiburg Multimodal Interaction Corpus

IF 0.5 Q3 LINGUISTICS

Open Linguistics Pub Date : 2023-11-17 DOI:10.1515/opli-2022-0245

Christoph Rühlemann, Alexander Ptak

{"title":"Reaching beneath the tip of the iceberg: A guide to the Freiburg Multimodal Interaction Corpus","authors":"Christoph Rühlemann, Alexander Ptak","doi":"10.1515/opli-2022-0245","DOIUrl":null,"url":null,"abstract":"Most corpora tacitly subscribe to a speech-only view filtering out anything that is not a ‘word’ and transcribing the spoken language merely orthographically despite the fact that the “speech-only view on language is fundamentally incomplete” (Kok 2017, 2) due to the deep intertwining of the verbal, vocal, and kinesic modalities (Levinson and Holler 2014). This article introduces the Freiburg Multimodal Interaction Corpus (FreMIC), a multimodal and interactional corpus of unscripted conversation in English currently under construction. At the time of writing, FreMIC comprises (i) c. 29 h of video-recordings transcribed and annotated in detail and (ii) automatically (and manually) generated multimodal data. All conversations are transcribed in ELAN both orthographically and using Jeffersonian conventions to render verbal content and interactionally relevant details of sequencing (e.g. overlap, latching), temporal aspects (pauses, acceleration/deceleration), phonological aspects (e.g. intensity, pitch, stretching, truncation, voice quality), and laughter. Moreover, the orthographic transcriptions are exhaustively PoS-tagged using the CLAWS web tagger (Garside and Smith 1997). ELAN-based transcriptions also provide exhaustive annotations of re-enactments (also referred to as (free) direct speech, constructed dialogue, etc.) as well as silent gestures (meaningful gestures that occur without accompanying speech). The multimodal data are derived from psychophysiological measurements and eye tracking. The psychophysiological measurements include, inter alia, electrodermal activity or GSR, which is indicative of emotional arousal (e.g. Peräkylä et al. 2015). Eye tracking produces data of two kinds: gaze direction and pupil size. In FreMIC, gazes are automatically recorded using the area-of-interest technology. Gaze direction is interactionally key, for example, in turn-taking (e.g. Auer 2021) and re-enactments (e.g. Pfeiffer and Weiss 2022), while changes in pupil size provide a window onto cognitive intensity (e.g. Barthel and Sauppe 2019). To demonstrate what opportunities FreMIC’s (combination of) transcriptions, annotations, and multimodal data open up for research in Interactional (Corpus) Linguistics, this article reports on interim results derived from work-in-progress.","PeriodicalId":43803,"journal":{"name":"Open Linguistics","volume":"94 5","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/opli-2022-0245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Most corpora tacitly subscribe to a speech-only view filtering out anything that is not a ‘word’ and transcribing the spoken language merely orthographically despite the fact that the “speech-only view on language is fundamentally incomplete” (Kok 2017, 2) due to the deep intertwining of the verbal, vocal, and kinesic modalities (Levinson and Holler 2014). This article introduces the Freiburg Multimodal Interaction Corpus (FreMIC), a multimodal and interactional corpus of unscripted conversation in English currently under construction. At the time of writing, FreMIC comprises (i) c. 29 h of video-recordings transcribed and annotated in detail and (ii) automatically (and manually) generated multimodal data. All conversations are transcribed in ELAN both orthographically and using Jeffersonian conventions to render verbal content and interactionally relevant details of sequencing (e.g. overlap, latching), temporal aspects (pauses, acceleration/deceleration), phonological aspects (e.g. intensity, pitch, stretching, truncation, voice quality), and laughter. Moreover, the orthographic transcriptions are exhaustively PoS-tagged using the CLAWS web tagger (Garside and Smith 1997). ELAN-based transcriptions also provide exhaustive annotations of re-enactments (also referred to as (free) direct speech, constructed dialogue, etc.) as well as silent gestures (meaningful gestures that occur without accompanying speech). The multimodal data are derived from psychophysiological measurements and eye tracking. The psychophysiological measurements include, inter alia, electrodermal activity or GSR, which is indicative of emotional arousal (e.g. Peräkylä et al. 2015). Eye tracking produces data of two kinds: gaze direction and pupil size. In FreMIC, gazes are automatically recorded using the area-of-interest technology. Gaze direction is interactionally key, for example, in turn-taking (e.g. Auer 2021) and re-enactments (e.g. Pfeiffer and Weiss 2022), while changes in pupil size provide a window onto cognitive intensity (e.g. Barthel and Sauppe 2019). To demonstrate what opportunities FreMIC’s (combination of) transcriptions, annotations, and multimodal data open up for research in Interactional (Corpus) Linguistics, this article reports on interim results derived from work-in-progress.

查看原文本刊更多论文

到达冰山一角:弗莱堡多模态交互语料库指南

大多数语料库默认只支持语音视图，过滤掉任何不是“单词”的东西，仅仅按正字法转录口语，尽管事实上“语言的语音视图基本上是不完整的”(Kok 2017, 2)，因为言语、声音和动作模式的深度交织(Levinson和Holler 2014)。本文介绍了Freiburg多模态交互语料库(Freiburg Multimodal Interaction Corpus, FreMIC)，这是一个目前正在建设中的多模态和交互的无脚本英语会话语料库。在撰写本文时，FreMIC包括(i) c. 29小时的视频记录的详细转录和注释，以及(ii)自动(和手动)生成的多模式数据。所有的对话都是在ELAN中正字法转录的，并使用杰斐逊的约定来呈现口头内容和交互相关的顺序细节(例如重叠、闩锁)、时间方面(暂停、加速/减速)、语音方面(例如强度、音高、拉伸、截断、音质)和笑声。此外，正字法转录是详尽的pos标记使用爪网络标签(Garside和史密斯1997年)。基于elan的转录还提供了详尽的再现注释(也称为(自由)直接语音、构造对话等)以及无声手势(没有伴随语音的有意义的手势)。多模态数据来源于心理生理测量和眼动追踪。心理生理测量包括，除其他外，皮电活动或GSR，这是情绪唤醒的指示(例如Peräkylä et al. 2015)。眼动追踪产生两种数据:凝视方向和瞳孔大小。在FreMIC中，使用感兴趣区域技术自动记录注视。例如，在轮流(例如Auer 2021)和重演(例如Pfeiffer和Weiss 2022)中，凝视方向是互动的关键，而瞳孔大小的变化为认知强度提供了一个窗口(例如Barthel和Sauppe 2019)。为了证明FreMIC(组合)的转录、注释和多模态数据为交互(语料库)语言学的研究提供了什么机会，本文报告了从正在进行的工作中获得的中期结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Open Linguistics LINGUISTICS-

CiteScore

1.70

自引率

0.00%

发文量

审稿时长

25 weeks

期刊介绍： Open Linguistics is a new academic peer-reviewed journal covering all areas of linguistics. The objective of this journal is to foster free exchange of ideas and provide an appropriate platform for presenting, discussing and disseminating new concepts, current trends, theoretical developments and research findings related to a broad spectrum of topics: descriptive linguistics, theoretical linguistics and applied linguistics from both diachronic and synchronic perspectives.