{"title":"Microphone utility estimation in acoustic sensor networks using single-channel signal features","authors":"M. Gunther, Andreas Brendel, Walter Kellermann","doi":"10.1186/s13636-023-00294-7","DOIUrl":"https://doi.org/10.1186/s13636-023-00294-7","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2022-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46826683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siqing Qin, Longbiao Wang, Sheng Li, J. Dang, Lixin Pan
{"title":"Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling","authors":"Siqing Qin, Longbiao Wang, Sheng Li, J. Dang, Lixin Pan","doi":"10.1186/s13636-021-00233-4","DOIUrl":"https://doi.org/10.1186/s13636-021-00233-4","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"2022 1","pages":"1-10"},"PeriodicalIF":2.4,"publicationDate":"2022-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43185179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jakub Janský, Zbyněk Koldovský, J. Málek, Tomás Kounovský, Jaroslav Cmejla
{"title":"Auxiliary function-based algorithm for blind extraction of a moving speaker","authors":"Jakub Janský, Zbyněk Koldovský, J. Málek, Tomás Kounovský, Jaroslav Cmejla","doi":"10.1186/s13636-021-00231-6","DOIUrl":"https://doi.org/10.1186/s13636-021-00231-6","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"2022 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2022-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65688142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the selection of the number of beamformers in beamforming-based binaural reproduction.","authors":"Itay Ifergan, Boaz Rafaely","doi":"10.1186/s13636-022-00238-7","DOIUrl":"10.1186/s13636-022-00238-7","url":null,"abstract":"<p><p>In recent years, spatial audio reproduction has been widely researched with many studies focusing on headphone-based spatial reproduction. A popular format for spatial audio is higher order Ambisonics (HOA), where a spherical microphone array is typically used to obtain the HOA signals. When a spherical array is not available, beamforming-based binaural reproduction (BFBR) can be used, where signals are captured with arrays of a general configuration. While shown to be useful, no comprehensive studies of BFBR have been presented and so its limitations and other design aspects are not well understood. This paper takes an initial step towards developing a theory for BFBR and develops guidelines for selecting the number of beamformers. In particular, the <i>average directivity factor</i> of the microphone array is proposed as a measure for supporting this selection. The effect of head-related transfer function (HRTF) order truncation that occurs when using too many beamformer directions is presented and studied. In addition, the relation between HOA-based binaural reproduction and BFBR is discussed through analysis based on a spherical array. A simulation study is then presented, based on both a spherical and a planar array, demonstrating the proposed guidelines. A listening test verifies the perceptual attributes of the methods presented in this study. These results can be used for more informed beamformer design for BFBR.</p>","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"2022 1","pages":"6"},"PeriodicalIF":2.4,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8965231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65688237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zolzaya Byambadorj, Ryota Nishimura, Altangerel Ayush, Kengo Ohta, N. Kitaoka
{"title":"Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation","authors":"Zolzaya Byambadorj, Ryota Nishimura, Altangerel Ayush, Kengo Ohta, N. Kitaoka","doi":"10.1186/s13636-021-00225-4","DOIUrl":"https://doi.org/10.1186/s13636-021-00225-4","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"2021 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65687751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anchor voiceprint recognition in live streaming via RawNet-SA and gated recurrent unit","authors":"Jiacheng Yao, J. Zhang, Jiafeng Li, L. Zhuo","doi":"10.1186/s13636-021-00234-3","DOIUrl":"https://doi.org/10.1186/s13636-021-00234-3","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"2021 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65688213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spherical harmonic covariance and magnitude function encodings for beamformer design","authors":"Yuancheng Luo","doi":"10.1186/s13636-021-00230-7","DOIUrl":"https://doi.org/10.1186/s13636-021-00230-7","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"2021 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65688119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangkun Liu, Hui Wang, Renhua Peng, C. Zheng, Xiaodong Li
{"title":"U2-VC: one-shot voice conversion using two-level nested U-structure","authors":"Fangkun Liu, Hui Wang, Renhua Peng, C. Zheng, Xiaodong Li","doi":"10.1186/s13636-021-00226-3","DOIUrl":"https://doi.org/10.1186/s13636-021-00226-3","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":"1-15"},"PeriodicalIF":2.4,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47351244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Carlo, Pinchas Tandeitnik, C. Foy, N. Bertin, Antoine Deleforge, S. Gannot
{"title":"dEchorate: a calibrated room impulse response dataset for echo-aware signal processing","authors":"D. Carlo, Pinchas Tandeitnik, C. Foy, N. Bertin, Antoine Deleforge, S. Gannot","doi":"10.1186/s13636-021-00229-0","DOIUrl":"https://doi.org/10.1186/s13636-021-00229-0","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49413624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust single- and multi-loudspeaker least-squares-based equalization for hearing devices","authors":"H. Schepker, Florian Denk, B. Kollmeier, S. Doclo","doi":"10.1186/s13636-022-00247-6","DOIUrl":"https://doi.org/10.1186/s13636-022-00247-6","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"2022 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2021-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41772564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}