{"title":"基于缺失特征方法的复调音频乐器识别","authors":"D. Giannoulis, Anssi Klapuri","doi":"10.1109/TASL.2013.2248720","DOIUrl":null,"url":null,"abstract":"A method is described for musical instrument recognition in polyphonic audio signals where several sound sources are active at the same time. The proposed method is based on local spectral features and missing-feature techniques. A novel mask estimation algorithm is described that identifies spectral regions that contain reliable information for each sound source, and bounded marginalization is then used to treat the feature vector elements that are determined to be unreliable. The mask estimation technique is based on the assumption that the spectral envelopes of musical sounds tend to be slowly-varying as a function of log-frequency and unreliable spectral components can therefore be detected as positive deviations from an estimated smooth spectral envelope. A computationally efficient algorithm is proposed for marginalizing the mask in the classification process. In simulations, the proposed method clearly outperforms reference methods for mixture signals. The proposed mask estimation technique leads to a recognition accuracy that is approximately half-way between a trivial all-one mask (all features are assumed reliable) and an ideal “oracle” mask.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2248720","citationCount":"32","resultStr":"{\"title\":\"Musical Instrument Recognition in Polyphonic Audio Using Missing Feature Approach\",\"authors\":\"D. Giannoulis, Anssi Klapuri\",\"doi\":\"10.1109/TASL.2013.2248720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A method is described for musical instrument recognition in polyphonic audio signals where several sound sources are active at the same time. The proposed method is based on local spectral features and missing-feature techniques. A novel mask estimation algorithm is described that identifies spectral regions that contain reliable information for each sound source, and bounded marginalization is then used to treat the feature vector elements that are determined to be unreliable. The mask estimation technique is based on the assumption that the spectral envelopes of musical sounds tend to be slowly-varying as a function of log-frequency and unreliable spectral components can therefore be detected as positive deviations from an estimated smooth spectral envelope. A computationally efficient algorithm is proposed for marginalizing the mask in the classification process. In simulations, the proposed method clearly outperforms reference methods for mixture signals. The proposed mask estimation technique leads to a recognition accuracy that is approximately half-way between a trivial all-one mask (all features are assumed reliable) and an ideal “oracle” mask.\",\"PeriodicalId\":55014,\"journal\":{\"name\":\"IEEE Transactions on Audio Speech and Language Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TASL.2013.2248720\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Audio Speech and Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TASL.2013.2248720\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2248720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Musical Instrument Recognition in Polyphonic Audio Using Missing Feature Approach
A method is described for musical instrument recognition in polyphonic audio signals where several sound sources are active at the same time. The proposed method is based on local spectral features and missing-feature techniques. A novel mask estimation algorithm is described that identifies spectral regions that contain reliable information for each sound source, and bounded marginalization is then used to treat the feature vector elements that are determined to be unreliable. The mask estimation technique is based on the assumption that the spectral envelopes of musical sounds tend to be slowly-varying as a function of log-frequency and unreliable spectral components can therefore be detected as positive deviations from an estimated smooth spectral envelope. A computationally efficient algorithm is proposed for marginalizing the mask in the classification process. In simulations, the proposed method clearly outperforms reference methods for mixture signals. The proposed mask estimation technique leads to a recognition accuracy that is approximately half-way between a trivial all-one mask (all features are assumed reliable) and an ideal “oracle” mask.
期刊介绍:
The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.