通过逆向解题进行音频解码

Pedro J. Villasana T., Lars Villemoes, Janusz Klejsa, Per Hedelin
{"title":"通过逆向解题进行音频解码","authors":"Pedro J. Villasana T., Lars Villemoes, Janusz Klejsa, Per Hedelin","doi":"arxiv-2409.07858","DOIUrl":null,"url":null,"abstract":"We consider audio decoding as an inverse problem and solve it through\ndiffusion posterior sampling. Explicit conditioning functions are developed for\ninput signal measurements provided by an example of a transform domain\nperceptual audio codec. Viability is demonstrated by evaluating arbitrary\npairings of a set of bitrates and task-agnostic prior models. For instance, we\nobserve significant improvements on piano while maintaining speech performance\nwhen a speech model is replaced by a joint model trained on both speech and\npiano. With a more general music model, improved decoding compared to legacy\nmethods is obtained for a broad range of content types and bitrates. The noisy\nmean model, underlying the proposed derivation of conditioning, enables a\nsignificant reduction of gradient evaluations for diffusion posterior sampling,\ncompared to methods based on Tweedie's mean. Combining Tweedie's mean with our\nconditioning functions improves the objective performance. An audio demo is\navailable at https://dpscodec-demo.github.io/.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Audio Decoding by Inverse Problem Solving\",\"authors\":\"Pedro J. Villasana T., Lars Villemoes, Janusz Klejsa, Per Hedelin\",\"doi\":\"arxiv-2409.07858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider audio decoding as an inverse problem and solve it through\\ndiffusion posterior sampling. Explicit conditioning functions are developed for\\ninput signal measurements provided by an example of a transform domain\\nperceptual audio codec. Viability is demonstrated by evaluating arbitrary\\npairings of a set of bitrates and task-agnostic prior models. For instance, we\\nobserve significant improvements on piano while maintaining speech performance\\nwhen a speech model is replaced by a joint model trained on both speech and\\npiano. With a more general music model, improved decoding compared to legacy\\nmethods is obtained for a broad range of content types and bitrates. The noisy\\nmean model, underlying the proposed derivation of conditioning, enables a\\nsignificant reduction of gradient evaluations for diffusion posterior sampling,\\ncompared to methods based on Tweedie's mean. Combining Tweedie's mean with our\\nconditioning functions improves the objective performance. An audio demo is\\navailable at https://dpscodec-demo.github.io/.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07858\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07858","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们将音频解码视为一个逆问题,并通过扩散后验采样来解决这个问题。我们针对变换域感知音频编解码器示例提供的输入信号测量结果开发了显式调节函数。通过评估一组比特率和任务无关先验模型的任意配对,证明了该方法的可行性。例如,当语音模型被同时在语音和钢琴上训练的联合模型所取代时,我们发现钢琴的性能有了显著提高,同时语音性能保持不变。使用更通用的音乐模型,在广泛的内容类型和比特率下,解码效果都比传统方法有所改进。与基于特威迪均值的方法相比,噪声均值模型是所提出的条件推导的基础,能显著减少扩散后验采样的梯度评估。将特威迪均值与我们的条件函数相结合,可以提高目标性能。音频演示见 https://dpscodec-demo.github.io/。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Audio Decoding by Inverse Problem Solving
We consider audio decoding as an inverse problem and solve it through diffusion posterior sampling. Explicit conditioning functions are developed for input signal measurements provided by an example of a transform domain perceptual audio codec. Viability is demonstrated by evaluating arbitrary pairings of a set of bitrates and task-agnostic prior models. For instance, we observe significant improvements on piano while maintaining speech performance when a speech model is replaced by a joint model trained on both speech and piano. With a more general music model, improved decoding compared to legacy methods is obtained for a broad range of content types and bitrates. The noisy mean model, underlying the proposed derivation of conditioning, enables a significant reduction of gradient evaluations for diffusion posterior sampling, compared to methods based on Tweedie's mean. Combining Tweedie's mean with our conditioning functions improves the objective performance. An audio demo is available at https://dpscodec-demo.github.io/.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信