{"title":"Measuring Respiration Rate from Speech.","authors":"Sidharth Abrol, Biswajit Das, Srikanth Nallanthighal, Okke Ouweltjes, Ulf Grossekathofer, Aki Härmä","doi":"10.1159/000544913","DOIUrl":null,"url":null,"abstract":"<p><p>The physical basis of speech production in humans requires the coordination of multiple anatomical systems, where inhalation and exhalation of air through lungs is at the core of the phenomenon. Vocalization happens during exhalation, while inhalation typically happens between speech pauses. We use deep learning models to predict respiratory signals during speech-breathing, from which the respiration rate is estimated. Bilingual data from a large clinical study (<i>N</i> = 1,005) are used to develop and evaluate a multivariate time series transformer model with speech encoder embeddings as input. The best model shows the predicted respiration rate from speech within ±3 BPM for 82% of test subjects. A <i>noise-aware</i> algorithm was also tested in a simulated hospital environment with varying noise levels to evaluate the impact on performance. This work proposes and validates speech as a virtual sensor for respiration rate, which can be an efficient and cost-effective enabler for remote patient monitoring and telehealth solutions.</p>","PeriodicalId":11242,"journal":{"name":"Digital Biomarkers","volume":"9 1","pages":"67-74"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11999658/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Biomarkers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1159/000544913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
The physical basis of speech production in humans requires the coordination of multiple anatomical systems, where inhalation and exhalation of air through lungs is at the core of the phenomenon. Vocalization happens during exhalation, while inhalation typically happens between speech pauses. We use deep learning models to predict respiratory signals during speech-breathing, from which the respiration rate is estimated. Bilingual data from a large clinical study (N = 1,005) are used to develop and evaluate a multivariate time series transformer model with speech encoder embeddings as input. The best model shows the predicted respiration rate from speech within ±3 BPM for 82% of test subjects. A noise-aware algorithm was also tested in a simulated hospital environment with varying noise levels to evaluate the impact on performance. This work proposes and validates speech as a virtual sensor for respiration rate, which can be an efficient and cost-effective enabler for remote patient monitoring and telehealth solutions.