Robust voice activity detection for social sensing

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication Pub Date : 2013-09-08 DOI:10.1145/2494091.2497347

S. Feese, G. Tröster

引用次数: 2

Abstract

The speech modality is a rich source of personal information. As such, speech detection is a fundamental function of many social sensing applications. Simply the amount of speech present in our surroundings can give indications about our socialbility and communication patterns. In this work, we present and evaluate a speech detection approach utilizing dictionary learning and sparse signal representation. Transforming the noisy audio data to the sparse representation with a dictionary learned from clean speech data, we show that speech and non speech can be discriminated even in low signal-to-noise conditions with up to 92% accuracy. In addition to an evaluation with simulated data, we evaluate the algorithm on a real-world data set recorded during firefighting missions. We show, that speech activity of firefighters can be detected with 85% accuracy when using a smartphone that was placed in the firefighting jacket.

查看原文本刊更多论文

基于社会感知的鲁棒语音活动检测

言语语态是个人信息的丰富来源。因此，语音检测是许多社会传感应用的基本功能。仅仅是我们周围的言语量就能显示出我们的社交能力和沟通模式。在这项工作中，我们提出并评估了一种利用字典学习和稀疏信号表示的语音检测方法。利用从干净语音数据中学习到的字典将噪声音频数据转换为稀疏表示，我们表明即使在低信噪比条件下也可以区分语音和非语音，准确率高达92%。除了用模拟数据进行评估外，我们还在消防任务期间记录的真实数据集上评估了该算法。我们发现，当使用放在消防夹克里的智能手机时，消防员的语音活动可以被检测到85%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

自引率

0.00%

发文量