基于GMMs和hmm混合模型的通用音频分类

Menaka Rajapakse, L. Wyse
{"title":"基于GMMs和hmm混合模型的通用音频分类","authors":"Menaka Rajapakse, L. Wyse","doi":"10.1109/MMMC.2005.44","DOIUrl":null,"url":null,"abstract":"A hybrid model comprised of Gaussian Mixtures Models (GMMs) and Hidden Markov Models (HMMs) is used to model generic sounds with large intra class perceptual variations. Each class has variable number of mixture components in the GMM. The number of mixture components is derived using the Minimum Description Length (MDL) criterion. The overall performance of the hybrid model was compared against models based on HMMs and GMMs with a fixed number of mixture components across all classes. We show that a hybrid model outperforms both class-based GMMs, HMMs, and GMMs based on fixed number of components. Further, our experiments revealed that the contribution of transitions between states in HMMs has no significant effect on the overall classification performance of generic sounds when large intra class perceptual variations are present among sounds in the training and test datasets. Sounds that show multi-event structure with events that tend to be similar (repetitive) indicated improved performance when modeled with HMMs that can be attributed to HMM’s state transition property. Conversely, GMMs indicate better performance when the sound samples show subtle or no repetitive behavior. These results were validated using the MuscleFish sound database.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Generic Audio Classification Using a Hybrid Model Based on GMMs and HMMs\",\"authors\":\"Menaka Rajapakse, L. Wyse\",\"doi\":\"10.1109/MMMC.2005.44\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A hybrid model comprised of Gaussian Mixtures Models (GMMs) and Hidden Markov Models (HMMs) is used to model generic sounds with large intra class perceptual variations. Each class has variable number of mixture components in the GMM. The number of mixture components is derived using the Minimum Description Length (MDL) criterion. The overall performance of the hybrid model was compared against models based on HMMs and GMMs with a fixed number of mixture components across all classes. We show that a hybrid model outperforms both class-based GMMs, HMMs, and GMMs based on fixed number of components. Further, our experiments revealed that the contribution of transitions between states in HMMs has no significant effect on the overall classification performance of generic sounds when large intra class perceptual variations are present among sounds in the training and test datasets. Sounds that show multi-event structure with events that tend to be similar (repetitive) indicated improved performance when modeled with HMMs that can be attributed to HMM’s state transition property. Conversely, GMMs indicate better performance when the sound samples show subtle or no repetitive behavior. These results were validated using the MuscleFish sound database.\",\"PeriodicalId\":121228,\"journal\":{\"name\":\"11th International Multimedia Modelling Conference\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"11th International Multimedia Modelling Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMMC.2005.44\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"11th International Multimedia Modelling Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMMC.2005.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

摘要

采用高斯混合模型(GMMs)和隐马尔可夫模型(hmm)组成的混合模型对具有较大类内感知变化的通用音进行建模。每一类在GMM中都有不同数量的混合成分。使用最小描述长度(MDL)准则推导出混合成分的数量。混合模型的整体性能与基于hmm和GMMs的模型进行了比较,这些模型在所有类别中具有固定数量的混合组件。我们表明,混合模型优于基于类的GMMs、hmm和基于固定数量组件的GMMs。此外,我们的实验表明,当训练和测试数据集中的声音存在较大的类内感知变化时,hmm中状态之间的转换对一般声音的整体分类性能没有显著影响。当用HMM建模时,显示多事件结构且事件倾向于相似(重复)的声音表明性能得到改善,这可归因于HMM的状态转换属性。相反,当声音样本显示微妙或没有重复行为时,GMMs表明性能更好。使用MuscleFish声音数据库验证了这些结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Generic Audio Classification Using a Hybrid Model Based on GMMs and HMMs
A hybrid model comprised of Gaussian Mixtures Models (GMMs) and Hidden Markov Models (HMMs) is used to model generic sounds with large intra class perceptual variations. Each class has variable number of mixture components in the GMM. The number of mixture components is derived using the Minimum Description Length (MDL) criterion. The overall performance of the hybrid model was compared against models based on HMMs and GMMs with a fixed number of mixture components across all classes. We show that a hybrid model outperforms both class-based GMMs, HMMs, and GMMs based on fixed number of components. Further, our experiments revealed that the contribution of transitions between states in HMMs has no significant effect on the overall classification performance of generic sounds when large intra class perceptual variations are present among sounds in the training and test datasets. Sounds that show multi-event structure with events that tend to be similar (repetitive) indicated improved performance when modeled with HMMs that can be attributed to HMM’s state transition property. Conversely, GMMs indicate better performance when the sound samples show subtle or no repetitive behavior. These results were validated using the MuscleFish sound database.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信