{"title":"高效的基于谱图的二值图像特征音频拷贝检测","authors":"Chahid Ouali, P. Dumouchel, Vishwa Gupta","doi":"10.1109/ICASSP.2015.7178279","DOIUrl":null,"url":null,"abstract":"This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Efficient spectrogram-based binary image feature for audio copy detection\",\"authors\":\"Chahid Ouali, P. Dumouchel, Vishwa Gupta\",\"doi\":\"10.1109/ICASSP.2015.7178279\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.\",\"PeriodicalId\":117666,\"journal\":{\"name\":\"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2015.7178279\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2015.7178279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient spectrogram-based binary image feature for audio copy detection
This paper presents the latest improvements on our Spectro system that detects transformed duplicate audio content. We propose a new binary image feature derived from a spectrogram matrix by using a threshold based on the average of the spectral values. We quantize this binary image by applying a tile of fixed size and computing the sum of each small square in the tile. Fingerprints of each binary image encode the positions of the selected tiles. Evaluation on TRECVID 2010 CBCD data shows that this new feature improves significantly the Spectro system for transformations that add irrelevant speech to the audio. Compared to a state-of-the-art audio fingerprinting system, the proposed method reduces the minimal Normalized Detection Cost Rate (min NDCR) by 33%, improves localization accuracy by 28% and results in 40% fewer missed queries.