FlowHash: Accelerating Audio Search With Balanced Hashing via Normalizing Flow

IF 4.1 2区 计算机科学 Q1 ACOUSTICS
Anup Singh;Kris Demuynck;Vipul Arora
{"title":"FlowHash: Accelerating Audio Search With Balanced Hashing via Normalizing Flow","authors":"Anup Singh;Kris Demuynck;Vipul Arora","doi":"10.1109/TASLP.2024.3486227","DOIUrl":null,"url":null,"abstract":"Nearest neighbor search on context representation vectors is a formidable task due to challenges posed by high dimensionality, scalability issues, and potential noise within query vectors. Our novel approach leverages normalizing flow within a self-supervised learning framework to effectively tackle these challenges, specifically in the context of audio fingerprinting tasks. Audio fingerprinting systems incorporate two key components: audio encoding and indexing. The existing systems consider these components independently, resulting in suboptimal performance. Our approach optimizes the interplay between these components, facilitating the adaptation of vectors to the indexing structure. Additionally, we distribute vectors in the latent \n<inline-formula><tex-math>$\\mathbb {R}^{K}$</tex-math></inline-formula>\n space using normalizing flow, resulting in balanced \n<inline-formula><tex-math>$K$</tex-math></inline-formula>\n-bit hash codes. This allows indexing vectors using a balanced hash table, where vectors are uniformly distributed across all possible \n<inline-formula><tex-math>$2^{K}$</tex-math></inline-formula>\n hash buckets. This significantly accelerates retrieval, achieving speedups of up to 2× and 1.4× compared to the Locality-Sensitive Hashing (LSH) and Product Quantization (PQ), respectively. We empirically demonstrate that our system is scalable, highly effective, and efficient in identifying short audio queries (\n<inline-formula><tex-math>$\\leq$</tex-math></inline-formula>\n2 s), particularly at high noise and reverberation levels.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4961-4970"},"PeriodicalIF":4.1000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10741572/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Nearest neighbor search on context representation vectors is a formidable task due to challenges posed by high dimensionality, scalability issues, and potential noise within query vectors. Our novel approach leverages normalizing flow within a self-supervised learning framework to effectively tackle these challenges, specifically in the context of audio fingerprinting tasks. Audio fingerprinting systems incorporate two key components: audio encoding and indexing. The existing systems consider these components independently, resulting in suboptimal performance. Our approach optimizes the interplay between these components, facilitating the adaptation of vectors to the indexing structure. Additionally, we distribute vectors in the latent $\mathbb {R}^{K}$ space using normalizing flow, resulting in balanced $K$ -bit hash codes. This allows indexing vectors using a balanced hash table, where vectors are uniformly distributed across all possible $2^{K}$ hash buckets. This significantly accelerates retrieval, achieving speedups of up to 2× and 1.4× compared to the Locality-Sensitive Hashing (LSH) and Product Quantization (PQ), respectively. We empirically demonstrate that our system is scalable, highly effective, and efficient in identifying short audio queries ( $\leq$ 2 s), particularly at high noise and reverberation levels.
流式散列:通过规范化流量平衡散列加速音频搜索
由于高维度、可扩展性问题和查询向量中的潜在噪声所带来的挑战,在上下文表示向量上进行近邻搜索是一项艰巨的任务。我们的新方法利用自监督学习框架中的归一化流来有效地应对这些挑战,特别是在音频指纹识别任务中。音频指纹识别系统包含两个关键部分:音频编码和索引。现有系统单独考虑这两个部分,导致性能不理想。我们的方法优化了这些组件之间的相互作用,促进了向量对索引结构的适应。此外,我们使用归一化流在潜在的 $\mathbb {R}^{K}$ 空间中分配向量,从而产生平衡的 $K$ 位散列码。这样就可以使用平衡哈希表来索引向量,其中向量均匀分布在所有可能的 2^{K}$ 哈希桶中。这大大加快了检索速度,与位置敏感散列(LSH)和乘积量化(PQ)相比,检索速度分别提高了 2 倍和 1.4 倍。我们通过经验证明,我们的系统在识别短音频查询($\leq$2 s)方面是可扩展、高效和有效的,尤其是在高噪声和混响水平下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
11.30
自引率
11.10%
发文量
217
期刊介绍: The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信