EPIC-SOUNDS: A Large-Scale Dataset of Actions That Sound.

Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman
{"title":"EPIC-SOUNDS: A Large-Scale Dataset of Actions That Sound.","authors":"Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman","doi":"10.1109/TPAMI.2025.3590390","DOIUrl":null,"url":null,"abstract":"<p><p>We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from video, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4 k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2 k non-categorised segments. We train and evaluate state-of-the-art audio recognition and detection models on our dataset, for both audio-only and audio-visual methods. We also conduct analysis on: the temporal overlap between audio events, the temporal and label correlations between audio and visual modalities, the ambiguities in annotating materials from audio-only input, the importance of audio-only labels and the limitations of current models to understand actions that sound.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2025.3590390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from video, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4 k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2 k non-categorised segments. We train and evaluate state-of-the-art audio recognition and detection models on our dataset, for both audio-only and audio-visual methods. We also conduct analysis on: the temporal overlap between audio events, the temporal and label correlations between audio and visual modalities, the ambiguities in annotating materials from audio-only input, the importance of audio-only labels and the limitations of current models to understand actions that sound.

EPIC-SOUNDS:声音动作的大规模数据集。
我们引入EPIC-SOUNDS,这是一个大规模的音频注释数据集,用于捕获以自我为中心的视频音频流中的时间范围和类标签。我们提出了一个注释管道,其中注释者暂时标记可区分的音频片段并描述可能导致该声音的动作。我们通过将这些自由形式的音频描述分组到类中,识别出可以完全与音频区分的动作。对于涉及物体碰撞的动作,我们收集这些物体的材料的人类注释(例如,一个玻璃物体被放置在木制表面上),我们从视频中验证,丢弃歧义。总体而言,EPIC-SOUNDS包括78.4 k可听事件和动作的分类片段,分布在44个类别以及39.2 k非分类片段。我们在我们的数据集上训练和评估最先进的音频识别和检测模型,包括纯音频和视听方法。我们还对以下方面进行了分析:音频事件之间的时间重叠、音频和视觉模式之间的时间和标签相关性、从纯音频输入中注释材料的模糊性、纯音频标签的重要性以及当前模型在理解声音行为方面的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信