An SVD-based scheme for MFCC compression in distributed speech recognition system

2013 IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2013-12-01 DOI:10.1109/ASRU.2013.6707738

A. Touazi, M. Debyeche

引用次数: 1

Abstract

This paper proposes a new scheme for low bit-rate source coding of Mel Frequency Cepstral Coefficients (MFCCs) in Distributed Speech Recognition (DSR) system. The method uses the compressed ETSI Advanced Front-End (ETSI-AFE) features factorized into SVD components. By investigating the correlation property between successive MFCC frames, the odd ones are encoded using ETSI-AFE, while only the singular values and the nearest left singular vectors index are encoded and transmitted for the even frames. At the server side, the non-transmitted MFCCs are evaluated through their quantized singular values and the nearest left singular vectors. The system provides a compression bit-rate of 2.7 kbps. The recognition experiments were carried out on the Aurora-2 database for clean and multi-condition training modes. The simulation results show good recognition performance without significant degradation, with respect to the ETSI-AFE encoder.

查看原文本刊更多论文

分布式语音识别系统中基于奇异值分解的MFCC压缩方案

提出了一种分布式语音识别(DSR)系统中低频倒谱系数(MFCCs)的低比特率源编码方案。该方法将压缩后的ETSI高级前端(ETSI- afe)特征分解为SVD分量。通过研究连续MFCC帧之间的相关性，采用ETSI-AFE对奇数帧进行编码，而对偶数帧只编码并传输奇异值和最接近的左奇异向量索引。在服务器端，通过量化奇异值和最接近的左奇异向量来评估非传输mfc。系统提供2.7 kbps的压缩比特率。在Aurora-2数据库上进行了清洁和多条件训练模式的识别实验。仿真结果表明，相对于ETSI-AFE编码器，该算法具有良好的识别性能，且没有明显的退化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

自引率

0.00%

发文量