{"title":"Decoding the dancing of the tongue: A model-based learning approach to phonetic targets in coarticulationa).","authors":"Jianguo Wei, Guochen Bai, Wenhuan Lu, Jianwu Dang","doi":"10.1121/10.0032362","DOIUrl":"https://doi.org/10.1121/10.0032362","url":null,"abstract":"<p><p>A model synthesizing average frequency components from select sentences in an electromagnetic articulography database has been crafted. This revealed the dual roles of the tongue: its dorsum acts like a carrier wave, and the tip acts as a modulation signal within the articulatory realm. This model illuminates anticipatory coarticulation's subtleties during speech planning. It undergoes rigorous, two-stage optimization: statistical estimation and refinement to depict carryover and anticipation. The model's base, rooted in physiological insights, deciphers carryover targets while its upper layer captures anticipation. Optimization has pinpointed unique phonetic targets for each phoneme, providing deep insights into virtual target formation during speech planning. These simulations, aligning closely with empirical data and marked by a mere 0.18 cm average error, along with extensive listening tests attest to the model's accuracy and enhanced speech synthesis quality.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142468567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seunghyun Yoon, Yongsung Park, Keunhwa Lee, Woojae Seong
{"title":"Physics-informed neural networks in support of modal wavenumber estimation.","authors":"Seunghyun Yoon, Yongsung Park, Keunhwa Lee, Woojae Seong","doi":"10.1121/10.0030461","DOIUrl":"https://doi.org/10.1121/10.0030461","url":null,"abstract":"<p><p>A physics-informed neural network (PINN) enables the estimation of horizontal modal wavenumbers using ocean pressure data measured at multiple ranges. Mode representations for the ocean acoustic pressure field are derived from the Hankel transform relationship between the depth-dependent Green's function in the horizontal wavenumber domain and the field in the range domain. We obtain wavenumbers by transforming the range samples to the wavenumber domain, and maintaining range coherence of the data is crucial for accurate wavenumber estimation. In the ocean environment, the sensitivity of phase variations in range often leads to degradation in range coherence. To address this, we propose using OceanPINN [Yoon, Park, Gerstoft, and Seong, J. Acoust. Soc. Am. 155(3), 2037-2049 (2024)] to manage spatially non-coherent data. OceanPINN is trained using the magnitude of the data and predicts phase-refined data. Modal wavenumber estimation methods are then applied to this refined data, where the enhanced range coherence results in improved accuracy. Additionally, sparse Bayesian learning, with its high-resolution capability, further improves the modal wavenumber estimation. The effectiveness of the proposed approach is validated through its application to both simulated and SWellEx-96 experimental data.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142391473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zehui Yang, Weihang Nie, Lingxuan Ye, Gaofeng Cheng, Yonghong Yan
{"title":"Reliable underwater multi-target direction of arrival estimation with optimal transport using deep models.","authors":"Zehui Yang, Weihang Nie, Lingxuan Ye, Gaofeng Cheng, Yonghong Yan","doi":"10.1121/10.0030398","DOIUrl":"10.1121/10.0030398","url":null,"abstract":"<p><p>Multi-target direction of arrival (DoA) estimation is an important and challenging task for sonar signal processing. In this study, we propose a method called learning direction of arrival with optimal transport (LOT) to accurately estimate the DoAs of multiple sources with a single deep model. We model the DoA estimation problem as a multi-label classification task and introduce an optimal transport (OT) loss based on the OT theory to capture the intrinsic continuity within the angular categories. We design a cost matrix for the OT loss in LOT approach to characterize the order and periodicity of the angular grid. The LOT approach encourages reliable predictions closer to the ground truth and suppresses spurious targets. We also propose a lightweight channel mask data augmentation module for deep models that use items related to the covariance matrix as input. The proposed methods can be seamlessly integrated with different model architectures and we indicate the portability with experiments on several typical network backbones. Experiments across various scenarios using different measurements show the effectiveness and robustness of our approaches. Results on SwellEx-96 experimental data demonstrate the practicality in real applications.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142365678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P Y Chan, S K Tang, Chi-Chung Cheung, K W Mui, S C Fu
{"title":"A feasibility study on active sound reduction across an acoustic plenum window by cancelling source clusters on internal periphery of the window cavity.","authors":"P Y Chan, S K Tang, Chi-Chung Cheung, K W Mui, S C Fu","doi":"10.1121/10.0030407","DOIUrl":"10.1121/10.0030407","url":null,"abstract":"<p><p>The possibility of applying active control to reduce sound transmission across a practical plenum window is examined experimentally in the present study using measured transfer functions of all related sound transmission paths. As a result of the limited space within the window, the error microphones are located at the indoor window opening while the secondary cancelling sources are mounted along the periphery of the window void. Results show that the cancelling sources near the outdoor window opening corners and within the overlapping region of the window play more useful roles in the control. Also, the highest sound reduction is around 6 dB with six error microphones positioned either at the central region or along the periphery of the indoor window opening. However, the results with the central error microphones suggest the possibility of adopting a dual control system to enhance the low frequency performance. Control systems with fewer error microphones result in lower sound reduction. Besides, it is found that four cancelling sources, located around the outdoor opening of the window, will be enough to achieve meaningful active sound transmission reduction between 100 and 1000 Hz. Involving more cancelling sources does not result in better performance despite the added complexity.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142372165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cross-linguistic review of citation tone production studies: Methodology and recommendations.","authors":"Chenzi Xu, Cong Zhang","doi":"10.1121/10.0032356","DOIUrl":"10.1121/10.0032356","url":null,"abstract":"<p><p>The study of citation tones, lexical tones produced in isolation, is one of the first steps towards understanding speech prosody in tone languages. However, methodologies for investigating citation tones vary significantly, often leading to limited comparability of tone inventories, both within and across languages. This paper presents a systematic review of research methods and practices in 136 citation tone studies on 129 tonal language varieties in China, including 99 studies published in Chinese, which are therefore not easily available to an international scientific readership. The review provides an overview of possible analytical decisions along the research pipeline, and unveils considerable variation in data collection, analysis, and reporting conventions, particularly in how f0, the primary acoustic correlate for tone, is operationalised and reported across studies. Key methodological issues are identified, including small sample sizes and inadequate transparency in communicating methodological decisions and procedure. This paper offers a clear road map for citation tone production research and proposes a range of recommendations on speaker sampling, experimental design, acoustic processing techniques, f0 analysis, and result reporting, with the goal of facilitating future tonal research and enhancing resources for underrepresented tonal varieties.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142468558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards an optimal design of acoustic Luneburg lenses.","authors":"Andrey Ricardo da Silva, Victor Mosimann Duarte","doi":"10.1121/10.0030405","DOIUrl":"https://doi.org/10.1121/10.0030405","url":null,"abstract":"<p><p>Although the concept of acoustic Luneburg lenses was first proposed more than 50 years ago, its physical realization became feasible only in the last decade, owing to advancements in metamaterials research. Since then, numerous studies have explored the potential of these devices from the acoustic perspective. However, a comprehensive understanding of the mechanisms associated with the optimal performance of these lenses remains underexplored in the literature. This study conducts numerical investigations to identify parameters enhancing acoustic gain in Luneburg lenses. The analyses are conducted with the results obtained from a flattened Luneburg lens model based on the lattice Boltzmann method. Results, scaled with the Helmholtz number, He, indicate that the maximum acoustic gain occurs at He = 1.3, with performance sustained over a wide range of Helmholtz values. Analysis of surface impedance reveals underperformance for Helmholtz values below 0.5 due to viscous dissipation and above 2.0 due to Bragg reflections. These results provide a basis for evaluating the Helmholtz parameters that optimize the acoustic gain of Luneburg lenses.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142381102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeung-Hoon Lee, Yongsung Park, Peter Gerstoft, Yonghyun Kim
{"title":"Localization of partial electrical discharges using compressive spherical frequency-difference beamforming.","authors":"Jeung-Hoon Lee, Yongsung Park, Peter Gerstoft, Yonghyun Kim","doi":"10.1121/10.0032361","DOIUrl":"https://doi.org/10.1121/10.0032361","url":null,"abstract":"<p><p>Accurate localization of partial electrical discharges is essential for the diagnosis of high-voltage systems. The current study achieves this by employing an acoustic sensor array and a beamforming approach. The occurrence of a partial discharge is accompanied by the emission of high-frequency sounds in the ultrasonic range, making localization a challenging task requiring many sensors to avoid spatial aliasing. Compressive frequency-difference beamforming, as previously proposed, can be effective in addressing this issue. We expand the method to include near-field localization by utilizing a spherical wave and propose a two-step normalization process. This eliminates the bias associated with nonplanar waves and standardizes the field variables, thereby preserving only the phase and relative amplitude information. A distributed algorithm based on the alternating direction multiplier method is used to solve the associated convex optimization problem. The proposed method is demonstrated using simulated and experimental data.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142502773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Which to select?: Analysis of speaker representation with graph attention networks.","authors":"Hye-Jin Shim, Jee-Weon Jung, Ha-Jin Yu","doi":"10.1121/10.0032393","DOIUrl":"https://doi.org/10.1121/10.0032393","url":null,"abstract":"<p><p>Although the recent state-of-the-art systems show almost perfect performance, analysis of speaker embeddings has been lacking thus far. An in-depth analysis of speaker representation will be performed by looking into which features are selected. To this end, various intermediate representations of the trained model are observed using graph attentive feature aggregation, which includes a graph attention layer and graph pooling layer followed by a readout operation. To do so, the TIMIT dataset, which has comparably restricted conditions (e.g., the region and phoneme) is used after pre-training the model on the VoxCeleb dataset and then freezing the weight parameters. Through extensive experiments, there is a consistent trend in speaker representation in that the models learn to exploit sequence and phoneme information despite no supervision in that direction. The results shed light to help understand speaker embedding, which is yet considered to be a black box.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142468607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Influence of atmospheric state on variability of long-term residual ambient sound level measurements in a subalpine valley.","authors":"Davyd H Betchkal, Andrew W Hug","doi":"10.1121/10.0030300","DOIUrl":"https://doi.org/10.1121/10.0030300","url":null,"abstract":"<p><p>Two natural influences on the acoustic environments of mountainous parks and communities are flowing water and shifting weather. A central purpose of the acoustic measurement design used by the United States National Park Service is to provide spectral estimates of residual ambient sound level metrics at a seasonal time scale. Acoustic monitoring sampling methodologies are often designed using a sequence of similar measurements. When source and residual ambient spectra overlap, an estimate of variability in the latter is beneficial to successful monitoring design. The observed and modelled effects of atmospheric state on sound level are analyzed to reveal variability due to these effects at a long-term monitoring site in Denali National Park, Alaska. The analysis of variability incorporates a covariate that is otherwise challenging to estimate in remote settings: vertical temperature gradients in the atmospheric boundary layer. Results reveal inversions (positive gradients) in the atmosphere ≥30% between 19:00 and 09:00. Inversion strengths above 0.06 °C/m are associated with 10-15 dB increases in sound level over hourly time scales. Because inversions tend to occur during otherwise quiescent times of day, they ultimately reduce seasonal variability at the site and corresponding uncertainty in noise metrics for transportation noise arriving from varied directions.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142502772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High variability phonetic training facilitates perception-to-production transfer in Mandarin-speaking children with cochlear implants: An acoustic investigation.","authors":"Hao Zhang, Lele Xu, Wen Ma, Junning Han, Yanxiang Wang, Hongwei Ding, Yang Zhang","doi":"10.1121/10.0030466","DOIUrl":"https://doi.org/10.1121/10.0030466","url":null,"abstract":"<p><p>This study primarily aimed to evaluate the effectiveness of high variability phonetic training (HVPT) for children with cochlear implants (CIs) via the cross-modal transfer of perceptual learning to lexical tone production, a scope that has been largely neglected by previous training research. Sixteen CI participants received a five-session HVPT within a period of three weeks, whereas another 16 CI children were recruited without receiving any formal training. Lexical tone production was assessed with a picture naming task before the provision (pretest) and immediately after (posttest) and ten weeks after (follow-up test) the completion of the training protocol. The production samples were coded and analyzed acoustically. Despite considerable distinctions from the typical baselines of normal-hearing peers, the trained CI children exhibited significant improvements in Mandarin tone production from pretest to posttest in pitch height of T1, pitch slope of T2, and pitch curvature of T3. Moreover, the training-induced acoustic changes in the concave characteristic of the T3 contour was retained ten weeks after training termination. This study represents an initial acoustic investigation on HVPT-induced benefits in lexical tone production for the pediatric CI population, which provides valuable insights into applying this perceptual training technique as a viable tool in clinical practices.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142391471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}