基于GMMs和hmm混合模型的通用音频分类

11th International Multimedia Modelling Conference Pub Date : 2005-01-12 DOI:10.1109/MMMC.2005.44

Menaka Rajapakse, L. Wyse

{"title":"基于GMMs和hmm混合模型的通用音频分类","authors":"Menaka Rajapakse, L. Wyse","doi":"10.1109/MMMC.2005.44","DOIUrl":null,"url":null,"abstract":"A hybrid model comprised of Gaussian Mixtures Models (GMMs) and Hidden Markov Models (HMMs) is used to model generic sounds with large intra class perceptual variations. Each class has variable number of mixture components in the GMM. The number of mixture components is derived using the Minimum Description Length (MDL) criterion. The overall performance of the hybrid model was compared against models based on HMMs and GMMs with a fixed number of mixture components across all classes. We show that a hybrid model outperforms both class-based GMMs, HMMs, and GMMs based on fixed number of components. Further, our experiments revealed that the contribution of transitions between states in HMMs has no significant effect on the overall classification performance of generic sounds when large intra class perceptual variations are present among sounds in the training and test datasets. Sounds that show multi-event structure with events that tend to be similar (repetitive) indicated improved performance when modeled with HMMs that can be attributed to HMM’s state transition property. Conversely, GMMs indicate better performance when the sound samples show subtle or no repetitive behavior. These results were validated using the MuscleFish sound database.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Generic Audio Classification Using a Hybrid Model Based on GMMs and HMMs\",\"authors\":\"Menaka Rajapakse, L. Wyse\",\"doi\":\"10.1109/MMMC.2005.44\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A hybrid model comprised of Gaussian Mixtures Models (GMMs) and Hidden Markov Models (HMMs) is used to model generic sounds with large intra class perceptual variations. Each class has variable number of mixture components in the GMM. The number of mixture components is derived using the Minimum Description Length (MDL) criterion. The overall performance of the hybrid model was compared against models based on HMMs and GMMs with a fixed number of mixture components across all classes. We show that a hybrid model outperforms both class-based GMMs, HMMs, and GMMs based on fixed number of components. Further, our experiments revealed that the contribution of transitions between states in HMMs has no significant effect on the overall classification performance of generic sounds when large intra class perceptual variations are present among sounds in the training and test datasets. Sounds that show multi-event structure with events that tend to be similar (repetitive) indicated improved performance when modeled with HMMs that can be attributed to HMM’s state transition property. Conversely, GMMs indicate better performance when the sound samples show subtle or no repetitive behavior. These results were validated using the MuscleFish sound database.\",\"PeriodicalId\":121228,\"journal\":{\"name\":\"11th International Multimedia Modelling Conference\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"11th International Multimedia Modelling Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMMC.2005.44\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"11th International Multimedia Modelling Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMMC.2005.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

采用高斯混合模型(GMMs)和隐马尔可夫模型(hmm)组成的混合模型对具有较大类内感知变化的通用音进行建模。每一类在GMM中都有不同数量的混合成分。使用最小描述长度(MDL)准则推导出混合成分的数量。混合模型的整体性能与基于hmm和GMMs的模型进行了比较，这些模型在所有类别中具有固定数量的混合组件。我们表明，混合模型优于基于类的GMMs、hmm和基于固定数量组件的GMMs。此外，我们的实验表明，当训练和测试数据集中的声音存在较大的类内感知变化时，hmm中状态之间的转换对一般声音的整体分类性能没有显著影响。当用HMM建模时，显示多事件结构且事件倾向于相似(重复)的声音表明性能得到改善，这可归因于HMM的状态转换属性。相反，当声音样本显示微妙或没有重复行为时，GMMs表明性能更好。使用MuscleFish声音数据库验证了这些结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generic Audio Classification Using a Hybrid Model Based on GMMs and HMMs

A hybrid model comprised of Gaussian Mixtures Models (GMMs) and Hidden Markov Models (HMMs) is used to model generic sounds with large intra class perceptual variations. Each class has variable number of mixture components in the GMM. The number of mixture components is derived using the Minimum Description Length (MDL) criterion. The overall performance of the hybrid model was compared against models based on HMMs and GMMs with a fixed number of mixture components across all classes. We show that a hybrid model outperforms both class-based GMMs, HMMs, and GMMs based on fixed number of components. Further, our experiments revealed that the contribution of transitions between states in HMMs has no significant effect on the overall classification performance of generic sounds when large intra class perceptual variations are present among sounds in the training and test datasets. Sounds that show multi-event structure with events that tend to be similar (repetitive) indicated improved performance when modeled with HMMs that can be attributed to HMM’s state transition property. Conversely, GMMs indicate better performance when the sound samples show subtle or no repetitive behavior. These results were validated using the MuscleFish sound database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

11th International Multimedia Modelling Conference

自引率

0.00%

发文量