基于受限玻尔兹曼机的无监督音频分割

IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications Pub Date : 2014-07-07 DOI:10.1109/IISA.2014.6878838

A. Pikrakis

{"title":"基于受限玻尔兹曼机的无监督音频分割","authors":"A. Pikrakis","doi":"10.1109/IISA.2014.6878838","DOIUrl":null,"url":null,"abstract":"In this paper the Conditional Restricted Boltzmann Machine (CRBM) is employed in the context of unsupervised audio segmentation. The CRBM acts as a temporal modeling method and learns, from a maximum likelihood perspective, the temporal relationships of the feature vectors that have been extracted from a large corpus of training data. After the CRBM has been trained, we quantify the correlation of the activation of the neurons of the hidden layer for successive feature vectors by means of an appropriately defined similarity function. A simple thresholding scheme is then applied on the output of the similarity function to segment automatically the audio recording. Our experiments have been carried out on a large corpus of documentaries. We provide an interpretation of the segmentation results and comment on the segmentation efficiency of the method.","PeriodicalId":298835,"journal":{"name":"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Unsupervised audio segmentation based on Restricted Boltzmann Machines\",\"authors\":\"A. Pikrakis\",\"doi\":\"10.1109/IISA.2014.6878838\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper the Conditional Restricted Boltzmann Machine (CRBM) is employed in the context of unsupervised audio segmentation. The CRBM acts as a temporal modeling method and learns, from a maximum likelihood perspective, the temporal relationships of the feature vectors that have been extracted from a large corpus of training data. After the CRBM has been trained, we quantify the correlation of the activation of the neurons of the hidden layer for successive feature vectors by means of an appropriately defined similarity function. A simple thresholding scheme is then applied on the output of the similarity function to segment automatically the audio recording. Our experiments have been carried out on a large corpus of documentaries. We provide an interpretation of the segmentation results and comment on the segmentation efficiency of the method.\",\"PeriodicalId\":298835,\"journal\":{\"name\":\"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISA.2014.6878838\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2014.6878838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文将条件限制玻尔兹曼机(CRBM)应用于无监督音频分割。CRBM作为一种时间建模方法，从最大似然的角度学习从大量训练数据中提取的特征向量的时间关系。在对CRBM进行训练后，我们通过适当定义的相似性函数来量化连续特征向量的隐藏层神经元激活的相关性。然后对相似度函数的输出应用简单的阈值分割方案，对录音进行自动分割。我们的实验是在一个大的纪录片语料库上进行的。对分割结果进行了解释，并对该方法的分割效率进行了评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unsupervised audio segmentation based on Restricted Boltzmann Machines

In this paper the Conditional Restricted Boltzmann Machine (CRBM) is employed in the context of unsupervised audio segmentation. The CRBM acts as a temporal modeling method and learns, from a maximum likelihood perspective, the temporal relationships of the feature vectors that have been extracted from a large corpus of training data. After the CRBM has been trained, we quantify the correlation of the activation of the neurons of the hidden layer for successive feature vectors by means of an appropriately defined similarity function. A simple thresholding scheme is then applied on the output of the similarity function to segment automatically the audio recording. Our experiments have been carried out on a large corpus of documentaries. We provide an interpretation of the segmentation results and comment on the segmentation efficiency of the method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications

自引率

0.00%

发文量