Robust audio identification for MP3 popular music

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval Pub Date : 2010-07-19 DOI:10.1145/1835449.1835554

Wei Li, Yaduo Liu, X. Xue

{"title":"Robust audio identification for MP3 popular music","authors":"Wei Li, Yaduo Liu, X. Xue","doi":"10.1145/1835449.1835554","DOIUrl":null,"url":null,"abstract":"Audio identification via fingerprint has been an active research field with wide applications for years. Many technical papers were published and commercial software systems were also employed. However, most of these previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store on personal computers and transmit on the Internet. It would be interesting if a compressed unknown audio fragment is able to be directly recognized from the database without the fussy and time-consuming decompression-identification-recompression procedure. So far, very few algorithms run directly in the compressed domain for music information retrieval, and most of them take advantage of MDCT coefficients or derived energy type of features. As a first attempt, we propose in this paper utilizing compressed-domain spectral entropy as the audio feature to implement a novel audio fingerprinting algorithm. The compressed songs stored in a music database and the possibly distorted compressed query excerpts are first partially decompressed to obtain the MDCT coefficients as the intermediate result. Then by grouping granules into longer blocks, remapping the MDCT coefficients into 192 new frequency lines to unify the frequency distribution of long and short windows, and defining 9 new subbands which cover the main frequency bandwidth of popular songs in accordance with the scale-factor bands of short windows, we calculate the spectral entropy of all consecutive blocks and come to the final fingerprint sequence by means of magnitude relationship modeling. Experiments show that such fingerprints exhibit strong robustness against various audio signal distortions like recompression, noise interference, echo addition, equalization, band-pass filtering, pitch shifting, and slight time-scale modification etc. For 5s-long query examples which might be severely degraded, an average top-five retrieval precision rate of more than 90% can be obtained in our test data set composed of 1822 popular songs.","PeriodicalId":378368,"journal":{"name":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835449.1835554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Audio identification via fingerprint has been an active research field with wide applications for years. Many technical papers were published and commercial software systems were also employed. However, most of these previously reported methods work on the raw audio format in spite of the fact that nowadays compressed format audio, especially MP3 music, has grown into the dominant way to store on personal computers and transmit on the Internet. It would be interesting if a compressed unknown audio fragment is able to be directly recognized from the database without the fussy and time-consuming decompression-identification-recompression procedure. So far, very few algorithms run directly in the compressed domain for music information retrieval, and most of them take advantage of MDCT coefficients or derived energy type of features. As a first attempt, we propose in this paper utilizing compressed-domain spectral entropy as the audio feature to implement a novel audio fingerprinting algorithm. The compressed songs stored in a music database and the possibly distorted compressed query excerpts are first partially decompressed to obtain the MDCT coefficients as the intermediate result. Then by grouping granules into longer blocks, remapping the MDCT coefficients into 192 new frequency lines to unify the frequency distribution of long and short windows, and defining 9 new subbands which cover the main frequency bandwidth of popular songs in accordance with the scale-factor bands of short windows, we calculate the spectral entropy of all consecutive blocks and come to the final fingerprint sequence by means of magnitude relationship modeling. Experiments show that such fingerprints exhibit strong robustness against various audio signal distortions like recompression, noise interference, echo addition, equalization, band-pass filtering, pitch shifting, and slight time-scale modification etc. For 5s-long query examples which might be severely degraded, an average top-five retrieval precision rate of more than 90% can be obtained in our test data set composed of 1822 popular songs.

查看原文本刊更多论文

MP3流行音乐的鲁棒音频识别

多年来，指纹音频识别一直是一个活跃的研究领域，具有广泛的应用前景。发表了许多技术论文，也采用了商业软件系统。然而，尽管现在压缩格式的音频，尤其是MP3音乐，已经发展成为个人电脑上存储和互联网上传输的主要方式，但大多数先前报道的方法都适用于原始音频格式。如果压缩后的未知音频片段能够直接从数据库中识别出来，而无需繁琐且耗时的解压缩-识别-再压缩过程，那将是一件有趣的事情。到目前为止，直接在压缩域中运行音乐信息检索的算法很少，大多数算法都是利用MDCT系数或衍生能量类型特征。本文首次提出利用压缩域谱熵作为音频特征来实现一种新的音频指纹识别算法。首先对存储在音乐数据库中的压缩歌曲和可能失真的压缩查询摘录进行部分解压缩，得到MDCT系数作为中间结果。然后将颗粒分组为较长的块，将MDCT系数重新映射为192条新的频带，统一长、短窗口的频率分布，并根据短窗口的尺度因子频带定义9个覆盖流行歌曲主频带的新子带，计算所有连续块的谱熵，通过幅度关系建模得到最终的指纹序列。实验表明，该指纹对各种音频信号失真，如再压缩、噪声干扰、回声添加、均衡、带通滤波、基音移位和轻微的时间尺度修改等都具有较强的鲁棒性。对于可能会严重退化的5秒长的查询样例，在我们由1822首流行歌曲组成的测试数据集中，平均前5名的检索准确率可以达到90%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

自引率

0.00%

发文量