通用音频相似性度量的无监督锚点空间生成

2008 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2008-05-12 DOI:10.1109/ICASSP.2008.4517544

Lie Lu, A. Hanjalic

{"title":"通用音频相似性度量的无监督锚点空间生成","authors":"Lie Lu, A. Hanjalic","doi":"10.1109/ICASSP.2008.4517544","DOIUrl":null,"url":null,"abstract":"Reliably measuring similarity between audio clips is critical to many applications. As opposed to the conventional way of measuring audio similarity using low-level features directly, in this paper we consider the similarity computation using an anchor space. Each dimension of such a space corresponds to a semantic category (anchor). Mapping an audio clip onto this space results in a vector, which indicates the membership probability of this audio clip with respect to each semantic category. The more similar the mappings of two audio clips, the more similar they are. While an anchor space is typically generated in a supervised fashion, supervised approach is infeasible in many realistic scenarios where audio content semantics is too diverse or simply unknown a priori. We therefore propose an unsupervised approach to anchor space generation. There, spectral clustering is employed to cluster the audio clips with similar low-level features and then the obtained clusters are adopted as semantic categories. Using this semantic space for audio similarity computation shows a considerable accuracy improvement (7% on mAP) in an audio retrieval system, compared with the conventional low-level feature based approach.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Unsupervised anchor space generation for similarity measurement of general audio\",\"authors\":\"Lie Lu, A. Hanjalic\",\"doi\":\"10.1109/ICASSP.2008.4517544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reliably measuring similarity between audio clips is critical to many applications. As opposed to the conventional way of measuring audio similarity using low-level features directly, in this paper we consider the similarity computation using an anchor space. Each dimension of such a space corresponds to a semantic category (anchor). Mapping an audio clip onto this space results in a vector, which indicates the membership probability of this audio clip with respect to each semantic category. The more similar the mappings of two audio clips, the more similar they are. While an anchor space is typically generated in a supervised fashion, supervised approach is infeasible in many realistic scenarios where audio content semantics is too diverse or simply unknown a priori. We therefore propose an unsupervised approach to anchor space generation. There, spectral clustering is employed to cluster the audio clips with similar low-level features and then the obtained clusters are adopted as semantic categories. Using this semantic space for audio similarity computation shows a considerable accuracy improvement (7% on mAP) in an audio retrieval system, compared with the conventional low-level feature based approach.\",\"PeriodicalId\":333742,\"journal\":{\"name\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2008.4517544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2008.4517544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

可靠地测量音频片段之间的相似性对许多应用程序至关重要。与传统的直接使用低级特征测量音频相似度的方法不同，本文考虑使用锚点空间进行相似度计算。这样一个空间的每一个维度对应于一个语义范畴(锚)。将音频剪辑映射到这个空间会得到一个向量，它表示该音频剪辑相对于每个语义类别的隶属性概率。两个音频片段的映射越相似，它们就越相似。虽然锚点空间通常以监督方式生成，但在音频内容语义过于多样化或先验未知的许多现实场景中，监督方法是不可行的。因此，我们提出一种无监督的方法来生成锚点空间。其中，利用谱聚类对具有相似底层特征的音频片段进行聚类，得到的聚类作为语义类别。与传统的基于低级特征的方法相比，在音频检索系统中使用该语义空间进行音频相似度计算显示出相当大的准确性提高(在mAP上为7%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unsupervised anchor space generation for similarity measurement of general audio

Reliably measuring similarity between audio clips is critical to many applications. As opposed to the conventional way of measuring audio similarity using low-level features directly, in this paper we consider the similarity computation using an anchor space. Each dimension of such a space corresponds to a semantic category (anchor). Mapping an audio clip onto this space results in a vector, which indicates the membership probability of this audio clip with respect to each semantic category. The more similar the mappings of two audio clips, the more similar they are. While an anchor space is typically generated in a supervised fashion, supervised approach is infeasible in many realistic scenarios where audio content semantics is too diverse or simply unknown a priori. We therefore propose an unsupervised approach to anchor space generation. There, spectral clustering is employed to cluster the audio clips with similar low-level features and then the obtained clusters are adopted as semantic categories. Using this semantic space for audio similarity computation shows a considerable accuracy improvement (7% on mAP) in an audio retrieval system, compared with the conventional low-level feature based approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量