Saska Tirronen , Farhad Javanmardi , Hilla Pohjalainen , Sudarsana Reddy Kadiri , Kiran Reddy Mittapalle , Pyry Helkkula , Kasimir Kaitue , Mikko Minkkinen , Heli Tolppanen , Tuomo Nieminen , Paavo Alku
{"title":"Towards robust heart failure detection in digital telephony environments by utilizing transformer-based codec inversion","authors":"Saska Tirronen , Farhad Javanmardi , Hilla Pohjalainen , Sudarsana Reddy Kadiri , Kiran Reddy Mittapalle , Pyry Helkkula , Kasimir Kaitue , Mikko Minkkinen , Heli Tolppanen , Tuomo Nieminen , Paavo Alku","doi":"10.1016/j.specom.2025.103279","DOIUrl":null,"url":null,"abstract":"<div><div>This study introduces the Codec Transformer Network (CTN) to enhance the reliability of automatic heart failure (HF) detection from coded telephone speech by addressing codec-related challenges in digital telephony. The study specifically addresses the codec mismatch between training and inference in HF detection. CTN is designed to map the mel-spectrogram representations of encoded speech signals back to their original, non-encoded forms, thereby recovering HF-related discriminative information. The effectiveness of CTN is demonstrated in conjunction with three HF detectors, based on Support Vector Machine, Random Forest, and K-Nearest Neighbors classifiers. The results show that CTN effectively retrieves the discriminative information between patients and controls, and performs comparably to or better than a baseline approach, based on multi-condition training.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"173 ","pages":"Article 103279"},"PeriodicalIF":2.4000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639325000949","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
This study introduces the Codec Transformer Network (CTN) to enhance the reliability of automatic heart failure (HF) detection from coded telephone speech by addressing codec-related challenges in digital telephony. The study specifically addresses the codec mismatch between training and inference in HF detection. CTN is designed to map the mel-spectrogram representations of encoded speech signals back to their original, non-encoded forms, thereby recovering HF-related discriminative information. The effectiveness of CTN is demonstrated in conjunction with three HF detectors, based on Support Vector Machine, Random Forest, and K-Nearest Neighbors classifiers. The results show that CTN effectively retrieves the discriminative information between patients and controls, and performs comparably to or better than a baseline approach, based on multi-condition training.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.