基于Log-Mel谱图特征的GRU-Mixer架构的语音疼痛等级分类。

IF 3.3 3区医学 Q1 MEDICINE, GENERAL & INTERNAL

Diagnostics Pub Date : 2025-09-17 DOI:10.3390/diagnostics15182362

Adi Alhudhaif

{"title":"基于Log-Mel谱图特征的GRU-Mixer架构的语音疼痛等级分类。","authors":"Adi Alhudhaif","doi":"10.3390/diagnostics15182362","DOIUrl":null,"url":null,"abstract":"Background/Objectives: Automatic pain detection from speech signals holds strong promise for non-invasive and real-time assessment in clinical and caregiving settings, particularly for populations with limited capacity for self-report. Methods: In this study, we introduce a lightweight recurrent deep learning approach, namely the Gated Recurrent Unit (GRU)-Mixer model for pain level classification based on speech signals. The proposed model maps raw audio inputs into Log-Mel spectrogram features, which are passed through a stacked bidirectional GRU for modeling the spectral and temporal dynamics of vocal expressions. To extract compact utterance-level embeddings, an adaptive average pooling-based temporal mixing mechanism is applied over the GRU outputs, followed by a fully connected classification head alongside dropout regularization. This architecture is used for several supervised classification tasks, including binary (pain/non-pain), graded intensity (mild, moderate, severe), and thermal-state (cold/warm) classification. End-to-end training is done using speaker-independent splits and class-balanced loss to promote generalization and discourage bias. The provided audio inputs are normalized to a consistent 3-s window and resampled at 8 kHz for consistency and computational efficiency. Results: Experiments on the TAME Pain dataset showcase strong classification performance, achieving 83.86% accuracy for binary pain detection and as high as 75.36% for multiclass pain intensity classification. Conclusions: As the first deep learning based classification work on the TAME Pain dataset, this work introduces the GRU-Mixer as an effective benchmark architecture for future studies on speech-based pain recognition and affective computing.","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 18","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468818/pdf/","citationCount":"0","resultStr":"{\"title\":\"Pain Level Classification from Speech Using GRU-Mixer Architecture with Log-Mel Spectrogram Features.\",\"authors\":\"Adi Alhudhaif\",\"doi\":\"10.3390/diagnostics15182362\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background/Objectives: Automatic pain detection from speech signals holds strong promise for non-invasive and real-time assessment in clinical and caregiving settings, particularly for populations with limited capacity for self-report. Methods: In this study, we introduce a lightweight recurrent deep learning approach, namely the Gated Recurrent Unit (GRU)-Mixer model for pain level classification based on speech signals. The proposed model maps raw audio inputs into Log-Mel spectrogram features, which are passed through a stacked bidirectional GRU for modeling the spectral and temporal dynamics of vocal expressions. To extract compact utterance-level embeddings, an adaptive average pooling-based temporal mixing mechanism is applied over the GRU outputs, followed by a fully connected classification head alongside dropout regularization. This architecture is used for several supervised classification tasks, including binary (pain/non-pain), graded intensity (mild, moderate, severe), and thermal-state (cold/warm) classification. End-to-end training is done using speaker-independent splits and class-balanced loss to promote generalization and discourage bias. The provided audio inputs are normalized to a consistent 3-s window and resampled at 8 kHz for consistency and computational efficiency. Results: Experiments on the TAME Pain dataset showcase strong classification performance, achieving 83.86% accuracy for binary pain detection and as high as 75.36% for multiclass pain intensity classification. Conclusions: As the first deep learning based classification work on the TAME Pain dataset, this work introduces the GRU-Mixer as an effective benchmark architecture for future studies on speech-based pain recognition and affective computing.\",\"PeriodicalId\":11225,\"journal\":{\"name\":\"Diagnostics\",\"volume\":\"15 18\",\"pages\":\"\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468818/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/diagnostics15182362\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15182362","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

摘要

背景/目的：语音信号的自动疼痛检测在临床和护理环境中具有非侵入性和实时评估的强大前景，特别是对于自我报告能力有限的人群。方法：在本研究中，我们引入了一种轻量级的循环深度学习方法，即门控循环单元(GRU)-混合器模型，用于基于语音信号的疼痛等级分类。该模型将原始音频输入映射为Log-Mel频谱图特征，这些特征通过堆叠的双向GRU进行建模，以模拟声音表达的频谱和时间动态。为了提取紧凑的话语级嵌入，在GRU输出上应用了一种基于自适应平均池的时间混合机制，然后是一个完全连接的分类头和dropout正则化。该架构用于几个监督分类任务，包括二元（疼痛/非疼痛）、分级强度（轻度、中度、重度）和热状态（冷/热）分类。端到端训练使用说话人独立的分割和类平衡损失来促进泛化和阻止偏见。所提供的音频输入被归一化为一致的3-s窗口，并在8 kHz重新采样，以保持一致性和计算效率。结果：在TAME Pain数据集上的实验显示出较强的分类性能，对二值疼痛检测的准确率达到83.86%，对多类疼痛强度分类的准确率高达75.36%。结论：作为首个基于深度学习的TAME疼痛数据集分类工作，本工作将GRU-Mixer作为未来基于语音的疼痛识别和情感计算研究的有效基准架构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pain Level Classification from Speech Using GRU-Mixer Architecture with Log-Mel Spectrogram Features.

Background/Objectives: Automatic pain detection from speech signals holds strong promise for non-invasive and real-time assessment in clinical and caregiving settings, particularly for populations with limited capacity for self-report. Methods: In this study, we introduce a lightweight recurrent deep learning approach, namely the Gated Recurrent Unit (GRU)-Mixer model for pain level classification based on speech signals. The proposed model maps raw audio inputs into Log-Mel spectrogram features, which are passed through a stacked bidirectional GRU for modeling the spectral and temporal dynamics of vocal expressions. To extract compact utterance-level embeddings, an adaptive average pooling-based temporal mixing mechanism is applied over the GRU outputs, followed by a fully connected classification head alongside dropout regularization. This architecture is used for several supervised classification tasks, including binary (pain/non-pain), graded intensity (mild, moderate, severe), and thermal-state (cold/warm) classification. End-to-end training is done using speaker-independent splits and class-balanced loss to promote generalization and discourage bias. The provided audio inputs are normalized to a consistent 3-s window and resampled at 8 kHz for consistency and computational efficiency. Results: Experiments on the TAME Pain dataset showcase strong classification performance, achieving 83.86% accuracy for binary pain detection and as high as 75.36% for multiclass pain intensity classification. Conclusions: As the first deep learning based classification work on the TAME Pain dataset, this work introduces the GRU-Mixer as an effective benchmark architecture for future studies on speech-based pain recognition and affective computing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Diagnostics Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry

CiteScore

4.70

自引率

8.30%

发文量

2699

审稿时长

19.64 days

期刊介绍： Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.