{"title":"Two-stage spectral space and the perceptual properties of sound textures.","authors":"Hironori Maruyama, Isamu Motoyoshi","doi":"10.1121/10.0036219","DOIUrl":"https://doi.org/10.1121/10.0036219","url":null,"abstract":"<p><p>Textural sounds can be perceived in the natural environment such as wind, waterflows, and footsteps. Recent studies have shown that the perception of auditory textures can be described and synthesized by the multiple classes of time-averaged statistics or the linear spectra and energy spectra of input sounds. The findings lead to a possibility that the explicit perceptual property of a textural sound, such as heaviness and complexity, could be predictable from the two-stage spectra. In the present study, numerous rating data were collected for 17 different perceptual properties with 325 real-world sounds, and the relationship between the rating and the two-stage spectral characteristics was investigated. The analysis showed that the ratings for each property were strongly and systematically correlated with specific frequency bands in the two-stage spectral space. The subsequent experiment demonstrated further that manipulation of power at critical frequency bands significantly alters the perceived property of natural sounds in the predicted direction. The results suggest that the perceptual impression of sound texture is strongly dependent on the power distribution of first- and second-order acoustic filters in the early auditory system.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"2067-2076"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143700727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean Baptiste Tary, Christine Peirce, Richard W Hobbs
{"title":"Classification of Bryde's whale individuals using high-resolution time-frequency transforms and support vector machines.","authors":"Jean Baptiste Tary, Christine Peirce, Richard W Hobbs","doi":"10.1121/10.0036223","DOIUrl":"https://doi.org/10.1121/10.0036223","url":null,"abstract":"<p><p>Whales generate vocalizations which may, deliberately or not, encode caller identity cues. In this study, we analyze calls produced by Bryde's whales and recorded by ocean-bottom arrays of hydrophones deployed close to the Costa Rica Rift in the Panama Basin. These repetitive calls, consisting of two main frequency components at ∼20 and ∼36 Hz, have been shown to follow five coherent spatiotemporal tracks. Here, we use a high-resolution time-frequency transform, the fourth-order Fourier synchrosqueezing transform, to extract time-frequency characteristics (ridges) from each call to appraise their suitability for identifying individuals from each other. Focusing on high-quality calls recorded less than 5 km from their source, we then cluster these ridges using a support vector machine model resulting in an average cross-validation error of ∼11% and balanced accuracy of ∼86 ± 5%. Comparing these results with those obtained using the standard short-time Fourier transform, k-means clustering, and lower-quality signals, the Fourier synchrosqueezing transform approach, coupled with support vector machines, substantially improves classification. Consequently, the Bryde's whale calls potentially contain individual-specific information, suggesting that individuals can be studied using ocean-bottom data.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"2091-2101"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143700739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyou Li, Sipei Zhao, Yang Huang, Jing Lu, Ian S Burnett
{"title":"A distributed adaptive wave field synthesis system.","authors":"Tianyou Li, Sipei Zhao, Yang Huang, Jing Lu, Ian S Burnett","doi":"10.1121/10.0036240","DOIUrl":"https://doi.org/10.1121/10.0036240","url":null,"abstract":"<p><p>The conventional wave field synthesis (WFS) theory is based on the free field assumption and the performance of systems based on it deteriorates significantly in reverberant environments. By introducing an error microphone array to monitor reproduction errors, the adaptive WFS (AWFS) system adjusts the loudspeaker signals to correct the sound field in reverberant environments. The AWFS system utilizes a centralized control strategy with a single processor, which imposes a high computational burden on the processor due to global error estimation, limiting the application scale. To address this issue, this paper proposes a distributed AWFS (DAWFS) system for an acoustic sensor and actuator network using a distributed signal processing strategy. Simulation results in a rectangular room demonstrate that the proposed DAWFS system can achieve comparable sound reproduction performance to the conventional AWFS system, both at the near-field error microphone array and in the target listening area. A global computational complexity analysis shows that the proposed DAWFS system exhibits significantly lower computational complexity than existing AWFS systems in various application scenarios, especially for massive channel systems. The results further demonstrate the potential applicability of the proposed DAWFS system in realistic reverberant environments.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"2221-2235"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143719922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context effects on lexical tone categorization in quiet and noisy conditions by young, middle-aged, and older individuals.","authors":"Fei Chen, Chen Kuang, Liping Wang, Xiaoxiang Chen","doi":"10.1121/10.0036146","DOIUrl":"10.1121/10.0036146","url":null,"abstract":"<p><p>Previous studies focused on how contexts affect the recognition of lexical tones, primarily among healthy young adults in a quiet environment. However, little is known about how senescence and cognitive decline influence lexical tone normalization in adverse listening conditions. This study aims to explore how F0 shifts of the preceding context affect lexical tone identification across different age groups in quiet and noisy conditions. Twenty-two Mandarin-speaking young adults, 22 middle-aged adults, and 21 older adults with mild cognitive impairment (MCI) participated in tone identification tasks with and without speech contexts. The identification tasks with contexts were conducted in quiet and babble noise with signal-to-noise ratios (SNRs) set at 5 and 0 dB. Results showed that contextual F0 cues exerted an equal impact on lexical tone normalization across all three age groups in the quiet environment. Nevertheless, under SNRs of 5 and 0 dB, noise nullified such an effect. Moreover, working memory was negatively correlated with the size of lexical tone normalization in the older group. These findings suggest that context effects on Mandarin tone normalization tend to be resistant to senescence and MCI but susceptible to babble noise, offering further insights into the cognitive processing mechanisms underlying speech normalization.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"1795-1806"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143625149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acoustic radiation torque of a viscoelastic sphere in a zero-order Mathieu beam.","authors":"Junxin Li, Xiaofeng Zhang, Guangbin Zhang","doi":"10.1121/10.0036125","DOIUrl":"https://doi.org/10.1121/10.0036125","url":null,"abstract":"<p><p>The exact expressions of the three-dimension acoustic radiation torque (ART) of a viscoelastic sphere arbitrarily positioned in a zero-order Mathieu beam (zMB) are derived in this paper. The effects of the ellipticity parameters, half-cone angles, dimensionless frequency, and particle position on the acoustic radiation torques of the spherical particle are studied. Simulation results show the axial ART is zero for an arbitrarily positioned viscoelastic PE sphere in a zMB, while for the x or y axis ART, it varies significantly with the particle position and beam parameters. For certain combinations of beam offset and parameters, axial and transverse torques alternate between positive and negative values as the half-cone angle varies. When ka is away from the resonance frequency, the value of the torque is approximately 0.001, which means the torque is small and the particle can be rotated in a uniform angular acceleration. Moreover, ART shows symmetrical about beam center when the offset is less than one wavelength. A finite element model was established to verify the theory and the comparative results agreed with each other except for the values of ART at the first resonant frequency, which is related to the absorption of the particles. The study helps to better understand the potential mechanism of the particle rotation manipulation in a zMB.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"1703-1713"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143615785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximum a posteriori underwater acoustic source localization based on time differences of arrival accounting for refraction.","authors":"Wuyi Yang, Tao Zhang","doi":"10.1121/10.0036138","DOIUrl":"https://doi.org/10.1121/10.0036138","url":null,"abstract":"<p><p>Traditional underwater acoustic source localization methods based on time differences of arrival (TDOA) in the presence of refraction first estimate the source depth and range to each hydrophone and then estimate the horizontal location of the source. The accuracy of these methods is compromised by errors in range estimation. To address this, we propose a three-dimensional source localization method that utilizes TDOA measurements between direct and surface-reflected arrivals at N(N ≥3) hydrophones, taking into account refraction effects. By utilizing multipath signals reflected off the sea surface, the method considers hydrophone position errors, TDOA measurement inaccuracies, and sound-speed variations to perform a Bayesian maximum a posteriori estimation of source localization. Compared with the traditional two-step source localization methods, the proposed method directly estimates the source depth and horizontal location jointly, eliminating the need to estimate ranges between the source and hydrophones. Simulation studies analyzing and comparing the localization performance of the proposed method with that of a two-step source localization method demonstrate the effectiveness of the proposed method. This could lead to more reliable localization of underwater sources, crucial for various applications, such as marine research, underwater navigation, and environmental monitoring.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"1784-1794"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143625160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jithin Raj Balan, Srikanta K Mishra, Hansapani Rodrigo
{"title":"Extended high-frequency hearing and suprathreshold neural synchrony in the auditory brainstem.","authors":"Jithin Raj Balan, Srikanta K Mishra, Hansapani Rodrigo","doi":"10.1121/10.0036054","DOIUrl":"https://doi.org/10.1121/10.0036054","url":null,"abstract":"<p><p>Elevated hearing thresholds in the extended high frequencies (EHFs) (>8 kHz) are often associated with poorer speech-in-noise recognition despite a clinically normal audiogram. However, whether EHF hearing loss is associated with disruptions in neural processing within the auditory brainstem remains uncertain. The objective of the present study was to investigate whether elevated EHF thresholds influence neural processing at lower frequencies in individuals with normal audiograms. Auditory brainstem responses (ABRs) were recorded at a suprathreshold level (80 dB normal hearing level) from 45 participants with clinically normal hearing. The recording protocol was optimized to obtain robust wave I of the ABR. Results revealed no significant relationship between the pure tone average for EHFs and any ABR metrics at either rate, while adjusting for the effects of age, sex, and hearing thresholds at standard frequencies (0.25-8 kHz). Rate-dependent significant sex effects for wave I and V amplitude, I-V amplitude ratio, and III and V latency were observed. Elevated EHF hearing thresholds do not significantly affect the brainstem processing in the lower frequencies (<8 kHz).</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"1577-1586"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143542426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Astrid Oehme, Paul Schweidler, Moritz Schuck, André Fiebig, Steffen Lepa, Stefan Weinzierl
{"title":"A measuring instrument for the perceptual dimensions of road traffic noisea).","authors":"Astrid Oehme, Paul Schweidler, Moritz Schuck, André Fiebig, Steffen Lepa, Stefan Weinzierl","doi":"10.1121/10.0035940","DOIUrl":"https://doi.org/10.1121/10.0035940","url":null,"abstract":"<p><p>Road traffic, especially in urban environments, is one of the major sources of our everyday soundscape and its impact on human well-being and health is well documented. While most studies have used perceived annoyance as an indicator of perceptual impact, little is known about the dimensions that define the perceptual space evoked by the different characteristics of road traffic noise and the acoustic correlates of these dimensions. Therefore, the present study developed a psychological instrument to measure the qualities of road traffic noise. Based on a sample of contrast pairs created from third-order Ambisonics recordings of various traffic scenes in and around Berlin and reproduced in the laboratory, attributes were elicited using a standardized but open-ended procedure. Subsequently, 45 of the recorded traffic scenes were rated by 115 participants using a redundancy-adjusted set of attributes. Exploratory and confirmatory factor analyses of the questionnaire data yielded two optimal solutions with five and seven factors spanning the perceptual dimensions. The resulting psychometrically validated instruments, in the form of 15-item and 21-item questionnaires, can be used for the perceptual assessment and for the further development of technical parameters to predict not only annoyance but also other salient qualities of traffic-related soundscapes.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"1587-1597"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143542422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unimodal speech perception predicts stable individual differences in audiovisual benefit for phonemes, words and sentencesa).","authors":"Jacqueline von Seth, Máté Aller, Matthew H Davis","doi":"10.1121/10.0034846","DOIUrl":"https://doi.org/10.1121/10.0034846","url":null,"abstract":"<p><p>There are substantial individual differences in the benefit that can be obtained from visual cues during speech perception. Here, 113 normally hearing participants between the ages of 18 and 60 years old completed a three-part experiment investigating the reliability and predictors of individual audiovisual benefit for acoustically degraded speech. Audiovisual benefit was calculated as the relative intelligibility (at the individual-level) of approximately matched (at the group-level) auditory-only and audiovisual speech for materials at three levels of linguistic structure: meaningful sentences, monosyllabic words, and consonants in minimal syllables. This measure of audiovisual benefit was stable across sessions and materials, suggesting that a shared mechanism of audiovisual integration operates across levels of linguistic structure. Information transmission analyses suggested that this may be related to simple phonetic cue extraction: sentence-level audiovisual benefit was reliably predicted by the relative ability to discriminate place of articulation at the consonant-level. Finally, whereas unimodal speech perception was related to cognitive measures (matrix reasoning and vocabulary) and demographics (age and gender), audiovisual benefit was predicted only by unimodal speech perceptual abilities: Better lipreading ability and subclinically poorer hearing (speech reception thresholds) independently predicted enhanced audiovisual benefit. This work has implications for practices in quantifying audiovisual benefit and research identifying strategies to enhance multimodal communication in hearing loss.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"1554-1576"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143542430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manila Kodali, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku
{"title":"The machine learning-based prediction of the sound pressure level from pathological and healthy speech signals.","authors":"Manila Kodali, Sudarsana Reddy Kadiri, Shrikanth Narayanan, Paavo Alku","doi":"10.1121/10.0036123","DOIUrl":"https://doi.org/10.1121/10.0036123","url":null,"abstract":"<p><p>Vocal intensity is quantified by sound pressure level (SPL). The SPL can be measured by either using a sound level meter or by comparing the energy of the recorded speech signal with the energy of the recorded calibration tone of a known SPL. Neither of these approaches can be used if speech is recorded in real-life conditions using a device that is not calibrated for SPL measurements. To measure the SPL from non-calibrated recordings, where speech is presented on a normalized amplitude scale, this study investigates the use of the machine learning (ML)-based estimation of the SPL. Several ML-based systems consisting of a feature extraction stage and a regression stage were built. For the former, four conventional acoustic features, two state-of-the-art pre-trained features, and their combined feature set were compared. For the latter, three regression models were compared. The systems were trained using the healthy speech of an open repository. The systems were evaluated using both pathological speech produced by patients suffering from heart failure and using speech produced by healthy controls. The results showed that the best combination of the feature and regression model provided a mean absolute error of about 2 dB in the SPL estimation task.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"157 3","pages":"1726-1741"},"PeriodicalIF":2.1,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143623486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}