高效的基于谱图的二值图像特征音频拷贝检测

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI:10.1109/ICASSP.2015.7178279

Chahid Ouali, P. Dumouchel, Vishwa Gupta

{"title":"高效的基于谱图的二值图像特征音频拷贝检测","authors":"Chahid Ouali, P. Dumouchel, Vishwa Gupta","doi":"10.1109/ICASSP.2015.7178279","DOIUrl":null,"url":null,"abstract":"This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Efficient spectrogram-based binary image feature for audio copy detection\",\"authors\":\"Chahid Ouali, P. Dumouchel, Vishwa Gupta\",\"doi\":\"10.1109/ICASSP.2015.7178279\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.\",\"PeriodicalId\":117666,\"journal\":{\"name\":\"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2015.7178279\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2015.7178279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

本文介绍了我们的Spectro系统的最新改进，用于检测转换后的重复音频内容。我们提出了一种新的二值图像特征，该特征来源于光谱图矩阵，采用基于光谱值平均值的阈值。我们通过应用固定大小的贴图并计算贴图中每个小正方形的和来量化这个二值图像。每个二值图像的指纹编码所选贴图的位置。对TRECVID 2010 CBCD数据的评估表明，这一新功能显著改善了Spectro系统在将无关语音添加到音频中的转换。与目前最先进的音频指纹识别系统相比，该方法将最小归一化检测成本率(min NDCR)降低了33%，将定位精度提高了28%，将漏查率降低了40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient spectrogram-based binary image feature for audio copy detection

This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量