Acoustic estimation of voice roughness.

IF 1.7 4区 心理学 Q3 PSYCHOLOGY
Attention Perception & Psychophysics Pub Date : 2025-07-01 Epub Date: 2025-04-28 DOI:10.3758/s13414-025-03060-3
Andrey Anikin
{"title":"Acoustic estimation of voice roughness.","authors":"Andrey Anikin","doi":"10.3758/s13414-025-03060-3","DOIUrl":null,"url":null,"abstract":"<p><p>Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms.</p>","PeriodicalId":55433,"journal":{"name":"Attention Perception & Psychophysics","volume":" ","pages":"1771-1787"},"PeriodicalIF":1.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204943/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Attention Perception & Psychophysics","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13414-025-03060-3","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/28 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"PSYCHOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Roughness is a perceptual characteristic of sound that was first applied to musical consonance and dissonance, but it is increasingly recognized as a central aspect of voice quality in human and animal communication. It may be particularly important for asserting social dominance or attracting attention in urgent signals such as screams. To ensure that the results of roughness research are valid and consistent across studies, we need standard methodology for measuring it. I review the literature on roughness estimation, from classic psychoacoustics to more recent approaches, and present two collections of 602 human vocal samples whose roughness was rated by 162 listeners in perceptual experiments. Two algorithms for estimating roughness acoustically from modulation spectra are then presented and optimized to match the human ratings. One uses a bank of gammatone or Butterworth filters to obtain an auditory spectrogram, and a faster algorithm begins with a conventional spectrogram obtained with Short-Time Fourier transform; both explain ~ 50% of variance in average human ratings per stimulus. The range of modulation frequencies most relevant to roughness perception is [50, 200] Hz; this range can be selected with simple cutoff points or with a lognormal weighting function. Modulation and roughness spectrograms are proposed as visual aids for studying the dynamics of roughness in longer recordings. The described algorithms are implemented in the function modulationSpectrum() from the open-source R library soundgen. The audio recordings and their ratings are freely available from https://osf.io/gvcpx/ and can be used for benchmarking other algorithms.

声音粗糙度的声学估计。
粗糙度是声音的一种感知特征,最初应用于音乐的谐和和不谐和,但它越来越被认为是人类和动物交流中声音质量的一个核心方面。它对于维护社会支配地位或在尖叫等紧急信号中吸引注意尤其重要。为了确保粗糙度研究的结果是有效的和一致的,我们需要标准的方法来测量它。我回顾了关于粗糙估计的文献,从经典的心理声学到最近的方法,并提出了602个人类声音样本的两个集合,这些样本的粗糙程度由162个听众在感知实验中评定。然后提出了两种从调制光谱声学估计粗糙度的算法,并对其进行了优化,以匹配人类的评级。一种是使用一组伽玛酮或巴特沃斯滤波器来获得听觉谱图,一种更快的算法从使用短时傅里叶变换获得的传统谱图开始;两者都解释了人类对每个刺激的平均评分的50%的差异。与粗糙度感知最相关的调制频率范围为[50,200]Hz;这个范围可以用简单的截止点或用对数正态加权函数来选择。提出了调制和粗糙度谱图,作为研究长时间记录中粗糙度动态的视觉辅助工具。所描述的算法是在开源R库soundgen的函数modulationSpectrum()中实现的。音频记录及其评级可从https://osf.io/gvcpx/免费获得,并可用于对其他算法进行基准测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.60
自引率
17.60%
发文量
197
审稿时长
4-8 weeks
期刊介绍: The journal Attention, Perception, & Psychophysics is an official journal of the Psychonomic Society. It spans all areas of research in sensory processes, perception, attention, and psychophysics. Most articles published are reports of experimental work; the journal also presents theoretical, integrative, and evaluative reviews. Commentary on issues of importance to researchers appears in a special section of the journal. Founded in 1966 as Perception & Psychophysics, the journal assumed its present name in 2009.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信