高效的基于谱图的二值图像特征音频拷贝检测

Chahid Ouali, P. Dumouchel, Vishwa Gupta
{"title":"高效的基于谱图的二值图像特征音频拷贝检测","authors":"Chahid Ouali, P. Dumouchel, Vishwa Gupta","doi":"10.1109/ICASSP.2015.7178279","DOIUrl":null,"url":null,"abstract":"This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Efficient spectrogram-based binary image feature for audio copy detection\",\"authors\":\"Chahid Ouali, P. Dumouchel, Vishwa Gupta\",\"doi\":\"10.1109/ICASSP.2015.7178279\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.\",\"PeriodicalId\":117666,\"journal\":{\"name\":\"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2015.7178279\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2015.7178279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

摘要

本文介绍了我们的Spectro系统的最新改进,用于检测转换后的重复音频内容。我们提出了一种新的二值图像特征,该特征来源于光谱图矩阵,采用基于光谱值平均值的阈值。我们通过应用固定大小的贴图并计算贴图中每个小正方形的和来量化这个二值图像。每个二值图像的指纹编码所选贴图的位置。对TRECVID 2010 CBCD数据的评估表明,这一新功能显著改善了Spectro系统在将无关语音添加到音频中的转换。与目前最先进的音频指纹识别系统相比,该方法将最小归一化检测成本率(min NDCR)降低了33%,将定位精度提高了28%,将漏查率降低了40%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Efficient spectrogram-based binary image feature for audio copy detection
This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信