Towards robust features for classifying audio in the CueVideo system

S. Srinivasan, D. Petkovic, D. Ponceleón
{"title":"Towards robust features for classifying audio in the CueVideo system","authors":"S. Srinivasan, D. Petkovic, D. Ponceleón","doi":"10.1145/319463.319658","DOIUrl":null,"url":null,"abstract":"The role of audio in the context of multimedia applications involving video is becoming increasingly important. Many efforts in this area focus on audio data that contains some built-in semantic information structure such as in broadcast news, or focus on classification of audio that contains a single type of sound such as cleaar speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other types of background sounds. Segmentation of mixed audio has applications in detection of story boundaries in video, spoken document retrieval systems, audio retrieval systems etc. We modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio. Our preliminary experimental results show that we can achieve a classification accuracy of over 80% for such mixed audio. Our study also provides us with several helpful insights related to analyzing mixed audio in the context of real applications.","PeriodicalId":265329,"journal":{"name":"MULTIMEDIA '99","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MULTIMEDIA '99","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/319463.319658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 75

Abstract

The role of audio in the context of multimedia applications involving video is becoming increasingly important. Many efforts in this area focus on audio data that contains some built-in semantic information structure such as in broadcast news, or focus on classification of audio that contains a single type of sound such as cleaar speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other types of background sounds. Segmentation of mixed audio has applications in detection of story boundaries in video, spoken document retrieval systems, audio retrieval systems etc. We modify and combine audio features known to be effective in distinguishing speech from music, and examine their behavior on mixed audio. Our preliminary experimental results show that we can achieve a classification accuracy of over 80% for such mixed audio. Our study also provides us with several helpful insights related to analyzing mixed audio in the context of real applications.
在CueVideo系统中实现音频分类的鲁棒特性
在涉及视频的多媒体应用中,音频的作用变得越来越重要。这个领域的许多努力都集中在包含一些内置语义信息结构的音频数据上,比如广播新闻,或者集中在包含单一类型声音的音频分类上,比如清晰的语音或清晰的音乐。在CueVideo系统中,我们检测和分类由混合音频组成的音频,即语音和音乐与其他类型背景声音的组合。混合音频的分割在视频、语音文档检索系统、音频检索系统等的故事边界检测中都有应用。我们修改和组合已知的有效区分语音和音乐的音频特征,并检查它们在混合音频中的行为。我们的初步实验结果表明,对于这种混合音频,我们可以达到80%以上的分类准确率。我们的研究还为我们提供了一些与在实际应用环境中分析混合音频相关的有用见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信