Jorge Wuth, Rodrigo Mahu, Israel Cohen, Richard M Stern, Néstor Becerra Yoma
{"title":"A unified beamforming and source separation model for static and dynamic human-robot interaction.","authors":"Jorge Wuth, Rodrigo Mahu, Israel Cohen, Richard M Stern, Néstor Becerra Yoma","doi":"10.1121/10.0025238","DOIUrl":"10.1121/10.0025238","url":null,"abstract":"<p><p>This paper presents a unified model for combining beamforming and blind source separation (BSS). The validity of the model's assumptions is confirmed by recovering target speech information in noise accurately using Oracle information. Using real static human-robot interaction (HRI) data, the proposed combination of BSS with the minimum-variance distortionless response beamformer provides a greater signal-to-noise ratio (SNR) than previous parallel and cascade systems that combine BSS and beamforming. In the difficult-to-model HRI dynamic environment, the system provides a SNR gain that was 2.8 dB greater than the results obtained with the cascade combination, where the parallel combination is infeasible.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140029710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling and simulation of underwater acoustic propagation through a random distribution of ice blocks.","authors":"Nicholas P Chotiros, Sverre Holm","doi":"10.1121/10.0025395","DOIUrl":"https://doi.org/10.1121/10.0025395","url":null,"abstract":"<p><p>Acoustic propagation through a random distribution of 1 m ice cubes, from 100 to 1000 Hz, was simulated in a 3D finite element model. The effective sound speed and attenuation as functions of frequency were calculated from the simulated signals. Attempts were made to fit a number of models to the wave speed and attenuation, including single scattering, lossy water, and Biot approximations. An extended Biot model, developed for acoustic propagation in granular seabed sediments, was able to fit the simulation up to 300 Hz. Beyond this frequency, the simulation shows that multiple scattering dominates.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lu Han, Ming Wu, Xiangning Liao, Xiaoyi Gao, Jun Yang, Xiaochun Yin
{"title":"Directional sound radiation from a rectangular panel and the high frequency limit.","authors":"Lu Han, Ming Wu, Xiangning Liao, Xiaoyi Gao, Jun Yang, Xiaochun Yin","doi":"10.1121/10.0024757","DOIUrl":"https://doi.org/10.1121/10.0024757","url":null,"abstract":"<p><p>Directional sound radiation focuses sound in a specific direction and reduces sound radiation in other directions. This study uses a flat panel driven by an actuator array to realize two-dimensional directional sound radiation by the acoustic contrast control algorithm. The aliasing effect at higher frequencies is analyzed based on the modal vibration of the panel, and a method for estimating the high frequency limit is proposed. Actuator arrays with different parameters are simulated to verify the efficacy of the proposed method and compare the acoustic contrast response with the conventional loudspeaker arrays.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139682069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yadong Liu, Jahurul Islam, Kate Radford, Oksana Tkachman, Bryan Gick
{"title":"Tonguedness in speech: Lateral bias in lingual bracing.","authors":"Yadong Liu, Jahurul Islam, Kate Radford, Oksana Tkachman, Bryan Gick","doi":"10.1121/10.0024756","DOIUrl":"10.1121/10.0024756","url":null,"abstract":"<p><p>This study examines the lateral biases in tongue movements during speech production. It builds on previous research on asymmetry in various aspects of human biology and behavior, focusing on the tongue's asymmetric behavior during speech. The findings reveal that speakers have a pronounced preference toward one side of the tongue during lateral releases with a majority displaying the left-side bias. This lateral bias in tongue speech movements is referred to as tonguedness. This research contributes to our understanding of the articulatory mechanisms involved in tongue movements and underscores the importance of considering lateral biases in speech production research.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10848656/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139718105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Urszula Oszczapinska, Laurie M Heller, Seojun Jang, Bridget Nance
{"title":"Ecological sound loudness in environmental sound representations.","authors":"Urszula Oszczapinska, Laurie M Heller, Seojun Jang, Bridget Nance","doi":"10.1121/10.0024995","DOIUrl":"10.1121/10.0024995","url":null,"abstract":"<p><p>Listeners recognizing environmental sounds must contend with variations in level due to the source level and the environment. Nonetheless, variations in level disrupt short-term sound recognition [Susini, Houix, Seropian, and Lemaitre (2019). J. Acoust. Soc. Am. 146(2), EL172-EL176] suggesting that loudness is encoded. We asked whether the experimental custom of setting sounds to equal levels disrupts long-term recognition, especially if it creates a mismatch with ecological loudness. Environmental sounds were played at equalized or ecological levels. Although recognition improved with increased loudness and familiarity, this relationship was unaffected by equalization or real-life experience with the source. However, sound pleasantness was altered by deviations from the ecological level.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139907086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scattering measurements of rocky seafloors using a split-beam echosounder.","authors":"Jen A Gruber, Derek R Olson","doi":"10.1121/10.0024755","DOIUrl":"https://doi.org/10.1121/10.0024755","url":null,"abstract":"<p><p>Scattering measurements were made off the coast of Pacific Grove, CA at 200 kHz, in an exposed fractured granite seafloor. Using inertial sensors and a split-beam transducer, data were processed to obtain a range of grazing angles corresponding to scattering strength, and signal processing techniques were used to extract the relevant portion of each ping. The ensonified angular width from a circular aperture is presented. Scattering strength measurements using different assumptions regarding the grazing angle were compared. The empirical Lommel-Seeliger model provided a good fit to measured data with a parameter of -18.4 dB.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139718104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Platform motion estimation in multi-band synthetic aperture sonar with coupled variational autoencoders.","authors":"Angeliki Xenaki, Yan Pailhas, Alessandro Monti","doi":"10.1121/10.0024998","DOIUrl":"https://doi.org/10.1121/10.0024998","url":null,"abstract":"<p><p>Coherent processing in synthetic aperture sonar (SAS) requires platform motion estimation and compensation with sub-wavelength accuracy for high-resolution imaging. Micronavigation, i.e., through-the-sensor platform motion estimation, is essential when positioning information from navigational instruments is absent or inadequately accurate. A machine learning method based on variational Bayesian inference has been proposed for unsupervised data-driven micronavigation. Herein, the multiple-input multiple-output arrangement of a multi-band SAS system is exploited and combined with a hierarchical variational inference scheme, which self-supervises the learning of platform motion and results in improved micronavigation accuracy.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139907087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits.","authors":"Calbert Graham, Nathan Roll","doi":"10.1121/10.0024876","DOIUrl":"10.1121/10.0024876","url":null,"abstract":"<p><p>This study investigates Whisper's automatic speech recognition (ASR) system performance across diverse native and non-native English accents. Results reveal superior recognition in American compared to British and Australian English accents with similar performance in Canadian English. Overall, native English accents demonstrate higher accuracy than non-native accents. Exploring connections between speaker traits [sex, native language (L1) typology, and second language (L2) proficiency] and word error rate uncovers notable associations. Furthermore, Whisper exhibits enhanced performance in read speech over conversational speech with modifications based on speaker gender. The implications of these findings are discussed.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extension of Doak's momentum potential theory for multi-species and reacting flows.","authors":"Raffaele D'Aniello, Mario Casel, Karsten Knobloch","doi":"10.1121/10.0024994","DOIUrl":"10.1121/10.0024994","url":null,"abstract":"<p><p>This work extends Doak's momentum potential theory to multi-chemical-component and reactive, time-stationary fluctuating flows. Additional mixture-related components are found to be superimposed on the canonical vortical, acoustic, and thermal parts of momentum fluctuations and total fluctuating enthalpy. These extended relations are used to develop a time-averaged model that relates the acoustic power radiated to the far-field with clearly defined vortical, acoustic, thermal, and compositional near-field sources. The resulting model is designed to offer a more general and comprehensive way to describe the noise generated within combustion chambers.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139974848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nina R Benway, Jonathan L Preston, Asif Salekin, Elaine Hitchcock, Tara McAllister
{"title":"Evaluating acoustic representations and normalization for rhoticity classification in children with speech sound disorders.","authors":"Nina R Benway, Jonathan L Preston, Asif Salekin, Elaine Hitchcock, Tara McAllister","doi":"10.1121/10.0024632","DOIUrl":"10.1121/10.0024632","url":null,"abstract":"<p><p>The effects of different acoustic representations and normalizations were compared for classifiers predicting perception of children's rhotic versus derhotic /ɹ/. Formant and Mel frequency cepstral coefficient (MFCC) representations for 350 speakers were z-standardized, either relative to values in the same utterance or age-and-sex data for typical /ɹ/. Statistical modeling indicated age-and-sex normalization significantly increased classifier performances. Clinically interpretable formants performed similarly to MFCCs and were endorsed for deep neural network engineering, achieving mean test-participant-specific F1-score = 0.81 after personalization and replication (σx = 0.10, med = 0.83, n = 48). Shapley additive explanations analysis indicated the third formant most influenced fully rhotic predictions.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522988/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139652385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}