Phonetic information in the vowel spectrum: the meaning of mel-Frequency Cepstral Coefficients

IF 1.9 1区 文学 0 LANGUAGE & LINGUISTICS
Khalil Iskarous , Alessandro Vietti
{"title":"Phonetic information in the vowel spectrum: the meaning of mel-Frequency Cepstral Coefficients","authors":"Khalil Iskarous ,&nbsp;Alessandro Vietti","doi":"10.1016/j.wocn.2025.101434","DOIUrl":null,"url":null,"abstract":"<div><div>There is still disagreement in the acoustic phonetics literature on how phonetic information is encoded in the vowel acoustic spectrum. The “formant hypothesis” holds that formant frequency locations are the primary encoding of phonetic information. But perceptual experiments have shown that listeners can identify vowels, to a certain extent, even when formant peaks are suppressed. This has given rise to the “whole-spectrum” hypothesis, which describes each vowel segment in terms of a high-dimensional description of its entire spectrum. While the “whole-spectrum” hypothesis better predicts suppressed-formant vowel perception, one advantage of the “formant hypothesis” is that it parameterizes a vowel inventory of a language in terms of featural classes indexed by a few values of formant frequencies. These frequency scales serve to describe a language’s phonological organization and sound change. In this paper, we show that the mel-frequency Cepstral Coefficients (MFCCs), whole-spectrum parameterizations that have been used in speech technology from the 1970’s till today, also have a phonetic interpretation leading to the same featural classes as traditional description. This is despite the fact that for many decades they have been thought to not be interpretable. Our arguments are based on analyses of all vowel data from the TIMIT database, with large amounts of speaker, context, prosodic, and dialectal variability, using information theory, effect-size statistics, and Fourier theory. Our goal is to show that MFCCs can be useful for further developments in the field of acoustic phonetics, because while they extract phonetically-distinctive information from the entire spectrum, they can also further understanding of the linguistic structure of vowel spaces.</div></div>","PeriodicalId":51397,"journal":{"name":"Journal of Phonetics","volume":"112 ","pages":"Article 101434"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Phonetics","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0095447025000452","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

There is still disagreement in the acoustic phonetics literature on how phonetic information is encoded in the vowel acoustic spectrum. The “formant hypothesis” holds that formant frequency locations are the primary encoding of phonetic information. But perceptual experiments have shown that listeners can identify vowels, to a certain extent, even when formant peaks are suppressed. This has given rise to the “whole-spectrum” hypothesis, which describes each vowel segment in terms of a high-dimensional description of its entire spectrum. While the “whole-spectrum” hypothesis better predicts suppressed-formant vowel perception, one advantage of the “formant hypothesis” is that it parameterizes a vowel inventory of a language in terms of featural classes indexed by a few values of formant frequencies. These frequency scales serve to describe a language’s phonological organization and sound change. In this paper, we show that the mel-frequency Cepstral Coefficients (MFCCs), whole-spectrum parameterizations that have been used in speech technology from the 1970’s till today, also have a phonetic interpretation leading to the same featural classes as traditional description. This is despite the fact that for many decades they have been thought to not be interpretable. Our arguments are based on analyses of all vowel data from the TIMIT database, with large amounts of speaker, context, prosodic, and dialectal variability, using information theory, effect-size statistics, and Fourier theory. Our goal is to show that MFCCs can be useful for further developments in the field of acoustic phonetics, because while they extract phonetically-distinctive information from the entire spectrum, they can also further understanding of the linguistic structure of vowel spaces.
元音谱中的语音信息:mel-Frequency倒谱系数的意义
语音信息如何在元音声谱中编码,在声学语音学文献中仍存在分歧。“共振峰假说”认为共振峰频率位置是语音信息的主要编码。但感知实验表明,即使在形成峰被抑制的情况下,听众也能在一定程度上识别元音。这就产生了“全谱”假说,它用整个谱的高维描述来描述每个元音片段。虽然“全谱”假说能更好地预测被抑制的形成峰元音感知,但“形成峰假说”的一个优点是,它以几个形成峰频率值为索引的特征类别来参数化语言的元音清单。这些频率尺度用来描述一种语言的语音组织和声音变化。在本文中,我们展示了mel-frequency倒谱系数(MFCCs),即从20世纪70年代至今一直用于语音技术的全频谱参数化,也具有语音解释,导致与传统描述相同的特征类别。尽管几十年来它们一直被认为是不可解释的。我们的论点是基于对TIMIT数据库中所有元音数据的分析,使用信息论、效应大小统计和傅立叶理论,分析了大量的说话人、上下文、韵律和方言差异。我们的目标是证明mfcc在声学语音学领域的进一步发展是有用的,因为当它们从整个频谱中提取语音特征信息时,它们也可以进一步理解元音空间的语言结构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.50
自引率
26.30%
发文量
49
期刊介绍: The Journal of Phonetics publishes papers of an experimental or theoretical nature that deal with phonetic aspects of language and linguistic communication processes. Papers dealing with technological and/or pathological topics, or papers of an interdisciplinary nature are also suitable, provided that linguistic-phonetic principles underlie the work reported. Regular articles, review articles, and letters to the editor are published. Themed issues are also published, devoted entirely to a specific subject of interest within the field of phonetics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信