{"title":"Design of Medium to Low Bitrate Neural Audio Codec","authors":"Samarpreet Singh, Saurabh Singh Raghuvanshi, Vinal Patel","doi":"10.1109/I2CT57861.2023.10126323","DOIUrl":null,"url":null,"abstract":"Neural audio codecs are the most recent development in the field of audio compression. Traditional audio codecs rely on fixed signal processing pipelines and require domain-specific expertise to produce high-quality audio at low to high bit rates. However, the performance of conventional audio codecs usually degrades at low bit rates. Neural audio codecs perform enhancement and compression with no added latency. This paper further enhances the quality of neural audio codecs by integrating a psychoacoustic model with the existing structure that contains a convolutional encoder, decoder, and a residual vector quantizer. It used a combination of reconstruction and adversarial loss to train the model to generate high-quality audio content. Audio quality measures like PEAQ and MUSHRA are conducted to illustrate that the proposed model performs better than the existing model of neural audio codec.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Neural audio codecs are the most recent development in the field of audio compression. Traditional audio codecs rely on fixed signal processing pipelines and require domain-specific expertise to produce high-quality audio at low to high bit rates. However, the performance of conventional audio codecs usually degrades at low bit rates. Neural audio codecs perform enhancement and compression with no added latency. This paper further enhances the quality of neural audio codecs by integrating a psychoacoustic model with the existing structure that contains a convolutional encoder, decoder, and a residual vector quantizer. It used a combination of reconstruction and adversarial loss to train the model to generate high-quality audio content. Audio quality measures like PEAQ and MUSHRA are conducted to illustrate that the proposed model performs better than the existing model of neural audio codec.