Eurasip Journal on Audio Speech and Music Processing最新文献_第7页

Musical note onset detection based on a spectral sparsity measure. 基于频谱稀疏度量的音乐音符起音检测。

IF 2.4 3区计算机科学

Eurasip Journal on Audio Speech and Music Processing Pub Date : 2021-01-01 Epub Date: 2021-07-28 DOI: 10.1186/s13636-021-00214-7

Mina Mounir, Peter Karsmakers, Toon van Waterschoot

{"title":"Musical note onset detection based on a spectral sparsity measure.","authors":"Mina Mounir, Peter Karsmakers, Toon van Waterschoot","doi":"10.1186/s13636-021-00214-7","DOIUrl":"10.1186/s13636-021-00214-7","url":null,"abstract":"If music is the language of the universe, musical note onsets may be the syllables for this language. Not only do note onsets define the temporal pattern of a musical piece, but their time-frequency characteristics also contain rich information about the identity of the musical instrument producing the notes. Note onset detection (NOD) is the basic component for many music information retrieval tasks and has attracted significant interest in audio signal processing research. In this paper, we propose an NOD method based on a novel feature coined as Normalized Identification of Note Onset based on Spectral Sparsity (NINOS2). The NINOS2 feature can be thought of as a spectral sparsity measure, aiming to exploit the difference in spectral sparsity between the different parts of a musical note. This spectral structure is revealed when focusing on low-magnitude spectral components that are traditionally filtered out when computing note onset features. We present an extensive set of NOD simulation results covering a wide range of instruments, playing styles, and mixing options. The proposed algorithm consistently outperforms the baseline Logarithmic Spectral Flux (LSF) feature for the most difficult group of instruments which are the sustained-strings instruments. It also shows better performance for challenging scenarios including polyphonic music and vibrato performances.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2021 1","pages":"30"},"PeriodicalIF":2.4,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39683581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices. 基于深度多实例学习的可穿戴设备环境音频前景语音定位。

IF 2.4 3区计算机科学

Eurasip Journal on Audio Speech and Music Processing Pub Date : 2021-01-01 Epub Date: 2021-02-03 DOI: 10.1186/s13636-020-00194-0

Rajat Hebbar, Pavlos Papadopoulos, Ramon Reyes, Alexander F Danvers, Angelina J Polsinelli, Suzanne A Moseley, David A Sbarra, Matthias R Mehl, Shrikanth Narayanan

{"title":"Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices.","authors":"Rajat Hebbar, Pavlos Papadopoulos, Ramon Reyes, Alexander F Danvers, Angelina J Polsinelli, Suzanne A Moseley, David A Sbarra, Matthias R Mehl, Shrikanth Narayanan","doi":"10.1186/s13636-020-00194-0","DOIUrl":"https://doi.org/10.1186/s13636-020-00194-0","url":null,"abstract":"Over the recent years, machine learning techniques have been employed to produce state-of-the-art results in several audio related tasks. The success of these approaches has been largely due to access to large amounts of open-source datasets and enhancement of computational resources. However, a shortcoming of these methods is that they often fail to generalize well to tasks from real life scenarios, due to domain mismatch. One such task is foreground speech detection from wearable audio devices. Several interfering factors such as dynamically varying environmental conditions, including background speakers, TV, or radio audio, render foreground speech detection to be a challenging task. Moreover, obtaining precise moment-to-moment annotations of audio streams for analysis and model training is also time-consuming and costly. In this work, we use multiple instance learning (MIL) to facilitate development of such models using annotations available at a lower time-resolution (coarsely labeled). We show how MIL can be applied to localize foreground speech in coarsely labeled audio and show both bag-level and instance-level results. We also study different pooling methods and how they can be adapted to densely distributed events as observed in our application. Finally, we show improvements using speech activity detection embeddings as features for foreground detection.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2021 1","pages":"7"},"PeriodicalIF":2.4,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13636-020-00194-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25367001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Articulation constrained learning with application to speech emotion recognition. 发音约束学习在语音情感识别中的应用。

IF 2.4 3区计算机科学

Eurasip Journal on Audio Speech and Music Processing Pub Date : 2019-01-01 Epub Date: 2019-08-20 DOI: 10.1186/s13636-019-0157-9

Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti, Andreas Spanias

{"title":"Articulation constrained learning with application to speech emotion recognition.","authors":"Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti, Andreas Spanias","doi":"10.1186/s13636-019-0157-9","DOIUrl":"https://doi.org/10.1186/s13636-019-0157-9","url":null,"abstract":"Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ 1-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/,/AE/,/IY/,/UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2019 ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13636-019-0157-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37471483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass 从原始音频到无缝混合:为鼓和贝斯创建一个自动化的DJ系统

IF 2.4 3区计算机科学

Eurasip Journal on Audio Speech and Music Processing Pub Date : 2018-09-24 DOI: 10.1186/s13636-018-0134-8

Len Vande Veire, Tijl De Bie

引用次数: 5

Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases. 在孤立音符和独奏乐句中识别乐器的仿生光谱-时间特征。

IF 2.4 3区计算机科学

Eurasip Journal on Audio Speech and Music Processing Pub Date : 2015-01-01 DOI: 10.1186/s13636-015-0070-9

Kailash Patil, Mounya Elhilali

{"title":"Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases.","authors":"Kailash Patil, Mounya Elhilali","doi":"10.1186/s13636-015-0070-9","DOIUrl":"https://doi.org/10.1186/s13636-015-0070-9","url":null,"abstract":"The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be best captured through an analysis that encompasses both time and frequency domains; with a focus on the modulations or changes in the signal in the spectrotemporal space. This representation mimics the spectrotemporal receptive field (STRF) analysis believed to underlie processing in the central mammalian auditory system, particularly at the level of primary auditory cortex. How well does this STRF representation capture timbral identity of musical instruments in continuous solo recordings remains unclear. The current work investigates the applicability of the STRF feature space for instrument recognition in solo musical phrases and explores best approaches to leveraging knowledge from isolated musical notes for instrument recognition in solo recordings. The study presents an approach for parsing solo performances into their individual note constituents and adapting back-end classifiers using support vector machines to achieve a generalization of instrument recognition to off-the-shelf, commercially available solo music.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2015 ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13636-015-0070-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36776486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Biomimetic multi-resolution analysis for robust speaker recognition. 鲁棒说话人识别的仿生多分辨率分析。

IF 2.4 3区计算机科学

Eurasip Journal on Audio Speech and Music Processing Pub Date : 2012-01-01 Epub Date: 2012-09-07 DOI: 10.1186/1687-4722-2012-22

Sridhar Krishna Nemala, Dmitry N Zotkin, Ramani Duraiswami, Mounya Elhilali

{"title":"Biomimetic multi-resolution analysis for robust speaker recognition.","authors":"Sridhar Krishna Nemala, Dmitry N Zotkin, Ramani Duraiswami, Mounya Elhilali","doi":"10.1186/1687-4722-2012-22","DOIUrl":"https://doi.org/10.1186/1687-4722-2012-22","url":null,"abstract":"Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equipped with elaborate machinery for speech analysis and feature extraction, which hold great lessons for improving the performance of automatic speech processing systems under adverse conditions. The work presented here explores a biologically-motivated multi-resolution speaker information representation obtained by performing an intricate yet computationally-efficient analysis of the information-rich spectro-temporal attributes of the speech signal. We evaluate the proposed features in a speaker verification task performed on NIST SRE 2010 data. The biomimetic approach yields significant robustness in presence of non-stationary noise and reverberation, offering a new framework for deriving reliable features for speaker recognition and speech processing.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2012 ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4722-2012-22","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36781151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5