{"title":"Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources","authors":"O. Mubarak, E. Ambikairajah, J. Epps","doi":"10.1109/ISSPA.2005.1581014","DOIUrl":null,"url":null,"abstract":"Discrimination between speech and music signals is an important problem in efficient digital radio broadcasting, particularly for variable bit rate applications such as Internet radio. This paper presents a speech/music discrimination system based on a Mel frequency cepstral coefficient (MFCC) front end and a GMM classifier. This system can be used to select the optimum coding scheme for the current frame of an input signal without knowing a priori whether it contains speech-like or music-like characteristics. An analysis of speech and music error rates for different numbers of MFCCs (from 8 to 28) is presented. For the 46 minute evaluation database used in this experiment, an accuracy of up to 97.14% for music and 93.87% for speech can be attained.","PeriodicalId":385337,"journal":{"name":"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPA.2005.1581014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35
Abstract
Discrimination between speech and music signals is an important problem in efficient digital radio broadcasting, particularly for variable bit rate applications such as Internet radio. This paper presents a speech/music discrimination system based on a Mel frequency cepstral coefficient (MFCC) front end and a GMM classifier. This system can be used to select the optimum coding scheme for the current frame of an input signal without knowing a priori whether it contains speech-like or music-like characteristics. An analysis of speech and music error rates for different numbers of MFCCs (from 8 to 28) is presented. For the 46 minute evaluation database used in this experiment, an accuracy of up to 97.14% for music and 93.87% for speech can be attained.