{"title":"Pain Level Classification from Speech Using GRU-Mixer Architecture with Log-Mel Spectrogram Features.","authors":"Adi Alhudhaif","doi":"10.3390/diagnostics15182362","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background/Objectives</b>: Automatic pain detection from speech signals holds strong promise for non-invasive and real-time assessment in clinical and caregiving settings, particularly for populations with limited capacity for self-report. <b>Methods:</b> In this study, we introduce a lightweight recurrent deep learning approach, namely the Gated Recurrent Unit (GRU)-Mixer model for pain level classification based on speech signals. The proposed model maps raw audio inputs into Log-Mel spectrogram features, which are passed through a stacked bidirectional GRU for modeling the spectral and temporal dynamics of vocal expressions. To extract compact utterance-level embeddings, an adaptive average pooling-based temporal mixing mechanism is applied over the GRU outputs, followed by a fully connected classification head alongside dropout regularization. This architecture is used for several supervised classification tasks, including binary (pain/non-pain), graded intensity (mild, moderate, severe), and thermal-state (cold/warm) classification. End-to-end training is done using speaker-independent splits and class-balanced loss to promote generalization and discourage bias. The provided audio inputs are normalized to a consistent 3-s window and resampled at 8 kHz for consistency and computational efficiency. <b>Results:</b> Experiments on the TAME Pain dataset showcase strong classification performance, achieving 83.86% accuracy for binary pain detection and as high as 75.36% for multiclass pain intensity classification. <b>Conclusions:</b> As the first deep learning based classification work on the TAME Pain dataset, this work introduces the GRU-Mixer as an effective benchmark architecture for future studies on speech-based pain recognition and affective computing.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 18","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468818/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15182362","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background/Objectives: Automatic pain detection from speech signals holds strong promise for non-invasive and real-time assessment in clinical and caregiving settings, particularly for populations with limited capacity for self-report. Methods: In this study, we introduce a lightweight recurrent deep learning approach, namely the Gated Recurrent Unit (GRU)-Mixer model for pain level classification based on speech signals. The proposed model maps raw audio inputs into Log-Mel spectrogram features, which are passed through a stacked bidirectional GRU for modeling the spectral and temporal dynamics of vocal expressions. To extract compact utterance-level embeddings, an adaptive average pooling-based temporal mixing mechanism is applied over the GRU outputs, followed by a fully connected classification head alongside dropout regularization. This architecture is used for several supervised classification tasks, including binary (pain/non-pain), graded intensity (mild, moderate, severe), and thermal-state (cold/warm) classification. End-to-end training is done using speaker-independent splits and class-balanced loss to promote generalization and discourage bias. The provided audio inputs are normalized to a consistent 3-s window and resampled at 8 kHz for consistency and computational efficiency. Results: Experiments on the TAME Pain dataset showcase strong classification performance, achieving 83.86% accuracy for binary pain detection and as high as 75.36% for multiclass pain intensity classification. Conclusions: As the first deep learning based classification work on the TAME Pain dataset, this work introduces the GRU-Mixer as an effective benchmark architecture for future studies on speech-based pain recognition and affective computing.
DiagnosticsBiochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍:
Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.