Comparison of skewness-based salient event detector algorithms in speech

2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) Pub Date : 2015-10-01 DOI:10.1109/COGINFOCOM.2015.7390605

A. Kovács, G. Kiss, K. Vicsi, I. Winkler, M. Coath

{"title":"Comparison of skewness-based salient event detector algorithms in speech","authors":"A. Kovács, G. Kiss, K. Vicsi, I. Winkler, M. Coath","doi":"10.1109/COGINFOCOM.2015.7390605","DOIUrl":null,"url":null,"abstract":"In this work, we compare two skewness-based salient event detector algorithms, which can detect transients in human speech signals. Speech transients are characterized by rapid changes in signal energy. The purpose of this study was to compare the identification of transients by two different methods based on skewness calculation in order to develop a method to be used in studying the processing of speech transients in the human brain. The first method, the skewness in variable time (SKV) finds transients using a cochlear model. The skewness of the energy distribution for a variable time window is implemented on artificial neural networks. The second method, the automatic segmentation method for transient detection (RoT) is more speech segmentation-based and developed for detecting transient-speech segment ratio in spoken records. In the current study, the test corpus included Hungarian and English speech recorded from different speakers (2 male and 2 female for both languages) Results were compared by the F-measure, the Jaccard similarity index, and the Hamming distance. The results of the two algorithms were also tested against a hand-labeled corpus annotated by linguistic experts for an absolute assessment of the performance of the two methods. Transient detection was tested once for onset events alone and, separately, for onset and offset events together. The results show that in most cases, the RoT method works better on the expert labeled databases. Using F measure with +-25ms window length the following results were obtained when all type of transient events were evaluated: 0,664 on English and 0,834 on Hungarian. Otherwise, the two methods identify the same stimulus features as the transients also coinciding with those hand-labeled by experts.","PeriodicalId":377891,"journal":{"name":"2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COGINFOCOM.2015.7390605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this work, we compare two skewness-based salient event detector algorithms, which can detect transients in human speech signals. Speech transients are characterized by rapid changes in signal energy. The purpose of this study was to compare the identification of transients by two different methods based on skewness calculation in order to develop a method to be used in studying the processing of speech transients in the human brain. The first method, the skewness in variable time (SKV) finds transients using a cochlear model. The skewness of the energy distribution for a variable time window is implemented on artificial neural networks. The second method, the automatic segmentation method for transient detection (RoT) is more speech segmentation-based and developed for detecting transient-speech segment ratio in spoken records. In the current study, the test corpus included Hungarian and English speech recorded from different speakers (2 male and 2 female for both languages) Results were compared by the F-measure, the Jaccard similarity index, and the Hamming distance. The results of the two algorithms were also tested against a hand-labeled corpus annotated by linguistic experts for an absolute assessment of the performance of the two methods. Transient detection was tested once for onset events alone and, separately, for onset and offset events together. The results show that in most cases, the RoT method works better on the expert labeled databases. Using F measure with +-25ms window length the following results were obtained when all type of transient events were evaluated: 0,664 on English and 0,834 on Hungarian. Otherwise, the two methods identify the same stimulus features as the transients also coinciding with those hand-labeled by experts.

查看原文本刊更多论文

基于偏度的语音显著事件检测算法比较

在这项工作中，我们比较了两种基于偏度的显著事件检测器算法，这两种算法可以检测人类语音信号中的瞬态。语音瞬态的特征是信号能量的快速变化。本研究的目的是比较基于偏度计算的两种不同方法对瞬态的识别，以期建立一种用于研究人脑语音瞬态处理的方法。第一种方法，可变时间偏度(SKV)利用耳蜗模型寻找瞬态。在人工神经网络上实现了变时窗下能量分布的偏性。第二种方法，即瞬态检测的自动分割方法(automatic segmentation method for transient detection, RoT)，它更多地基于语音分割，是为检测语音记录中的瞬态语音分割比例而开发的。在本研究中，测试语料库包括来自不同说话者(两种语言各2名男性和2名女性)的匈牙利语和英语语音记录，并通过f测量、Jaccard相似指数和汉明距离对结果进行比较。这两种算法的结果也针对手工标记语料库进行了测试，由语言专家注释，以绝对评估这两种方法的性能。瞬态检测仅对启动事件进行一次测试，并分别对启动和偏移事件一起进行测试。结果表明，在大多数情况下，RoT方法在专家标记的数据库上效果更好。使用+-25ms窗口长度的F测量，在评估所有类型的瞬态事件时获得以下结果:英语为0,664，匈牙利语为0,834。否则，这两种方法识别的刺激特征与瞬态相同，并且与专家手工标记的特征一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

自引率

0.00%

发文量