Bao Thang Ta;Nhat Minh Le;Huynh Thi Thanh Binh;Van Hai Do
{"title":"Exploring Non-Matching Multiple References for Speech Quality Assessment","authors":"Bao Thang Ta;Nhat Minh Le;Huynh Thi Thanh Binh;Van Hai Do","doi":"10.1109/LSP.2025.3555190","DOIUrl":null,"url":null,"abstract":"Non-Matching Reference-based Speech Quality Assessment models typically require numerous references during inference to ensure stable and accurate predictions. However, this dependency introduces significant computational overhead, limiting their suitability for real-time applications. In this paper, we propose a novel training paradigm that directly addresses prediction instability at its source by integrating multiple references during training rather than during inference, as in existing approaches. This method allows the model to capture the inherent variability of reference signals, thereby enhancing prediction reliability. Additionally, we introduce an auxiliary variance loss function to minimize inconsistencies across predictions, ensuring stable assessments regardless of the number of references used. Experiments on the NISQA datasets demonstrate that, with the same training time, our method achieves consistent predictions with a single reference during inference, resulting in a 100-fold reduction in computational time while maintaining high accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1610-1614"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10938914/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Non-Matching Reference-based Speech Quality Assessment models typically require numerous references during inference to ensure stable and accurate predictions. However, this dependency introduces significant computational overhead, limiting their suitability for real-time applications. In this paper, we propose a novel training paradigm that directly addresses prediction instability at its source by integrating multiple references during training rather than during inference, as in existing approaches. This method allows the model to capture the inherent variability of reference signals, thereby enhancing prediction reliability. Additionally, we introduce an auxiliary variance loss function to minimize inconsistencies across predictions, ensuring stable assessments regardless of the number of references used. Experiments on the NISQA datasets demonstrate that, with the same training time, our method achieves consistent predictions with a single reference during inference, resulting in a 100-fold reduction in computational time while maintaining high accuracy.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.