Significance of Filterbank Structure for Capturing Dysarthric Information through Cepstral Coefficients

Laxmi Priya Sahu, G. Pradhan
{"title":"Significance of Filterbank Structure for Capturing Dysarthric Information through Cepstral Coefficients","authors":"Laxmi Priya Sahu, G. Pradhan","doi":"10.1109/SPCOM55316.2022.9840837","DOIUrl":null,"url":null,"abstract":"The short-term Fourier transform magnitude spectra (STFT-MS) computed from the dysarthric speech deviates nonlinearly from the normal speech in different frequency bands depending on underlying sound units. This discriminating information can be captured by segmenting the STFT-MS into different frequency bands following the power spectra of board categories of sound units. Motivated by this observation in this study, we have computed the cepstral coefficients by analyzing the STFT-MS in 0–500 Hz, 500–2000 Hz, 2000–4000 Hz, and 4000 – 8000Hz, respectively for 16 kHz sampled speech data. Each of the selected frequency bands is analyzed by using a 30 channel Mel filterbank. The log filterbank energies computed for each sub-band are then polled together and discrete cosine transform (DCT) is applied to compute the cepstral coefficients, here termed as sub-band enhanced Mel frequency cepstral coefficients (SE-MFCC). The i-vector based dysarthric intelligibility assessment system reported in this study shows that the SEMFCC outperforms the conventional Mel frequency cepstral coefficients (MFCC), and the cepstral coefficients computed using inverse-Mel filterbank (IMFCC), and linear filterbank (LFCC). The score level combination of SE-MFCC with the MFCC further improves the overall performance.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The short-term Fourier transform magnitude spectra (STFT-MS) computed from the dysarthric speech deviates nonlinearly from the normal speech in different frequency bands depending on underlying sound units. This discriminating information can be captured by segmenting the STFT-MS into different frequency bands following the power spectra of board categories of sound units. Motivated by this observation in this study, we have computed the cepstral coefficients by analyzing the STFT-MS in 0–500 Hz, 500–2000 Hz, 2000–4000 Hz, and 4000 – 8000Hz, respectively for 16 kHz sampled speech data. Each of the selected frequency bands is analyzed by using a 30 channel Mel filterbank. The log filterbank energies computed for each sub-band are then polled together and discrete cosine transform (DCT) is applied to compute the cepstral coefficients, here termed as sub-band enhanced Mel frequency cepstral coefficients (SE-MFCC). The i-vector based dysarthric intelligibility assessment system reported in this study shows that the SEMFCC outperforms the conventional Mel frequency cepstral coefficients (MFCC), and the cepstral coefficients computed using inverse-Mel filterbank (IMFCC), and linear filterbank (LFCC). The score level combination of SE-MFCC with the MFCC further improves the overall performance.
滤波器组结构对通过倒谱系数捕获异常信息的意义
短期傅里叶变换幅度谱(STFT-MS)计算从困难的语音在不同的频带非线性偏离正常的语音取决于底层的声音单位。这种鉴别信息可以通过将STFT-MS分割成不同的频带来捕获,这些频带是根据板类声音单元的功率谱划分的。基于这一观察结果,我们通过分析16 kHz采样语音数据在0-500 Hz、500-2000 Hz、2000-4000 Hz和4000 - 8000Hz的STFT-MS分别计算了倒谱系数。使用30通道Mel滤波器组对每个选定的频段进行分析。然后对每个子带计算的对数滤波器组能量进行轮询,并应用离散余弦变换(DCT)计算倒谱系数,这里称为子带增强Mel频率倒谱系数(SE-MFCC)。本研究中基于i向量的困难理解度评估系统表明,SEMFCC优于传统的Mel频率倒谱系数(MFCC),以及使用逆Mel滤波器组(IMFCC)和线性滤波器组(LFCC)计算的倒谱系数。SE-MFCC与MFCC的评分水平结合进一步提高了整体表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信