{"title":"Automatic language identification in broadcast news","authors":"G. Backfried, R. Rainoldi, J. Riedler","doi":"10.1109/IJCNN.2002.1007722","DOIUrl":null,"url":null,"abstract":"We present experiments on automatic language identification in the broadcast news domain. Because of the inherent diversity of news broadcasts, speech is extracted from the raw audio data by means of phone-level decoding using broad classes of phonemes. Training and testing was performed on recordings of German, English, Spanish and French news shows from a variety of European TV channels. Each language is characterized by a Gaussian mixture model solely created from corresponding acoustic features. The overall average error rate on speech segments is 16.32%. The current system disregards (almost) any kind of linguistic information; however, it is therefore easily extensible to new languages.","PeriodicalId":382771,"journal":{"name":"Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290)","volume":"54 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2002.1007722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We present experiments on automatic language identification in the broadcast news domain. Because of the inherent diversity of news broadcasts, speech is extracted from the raw audio data by means of phone-level decoding using broad classes of phonemes. Training and testing was performed on recordings of German, English, Spanish and French news shows from a variety of European TV channels. Each language is characterized by a Gaussian mixture model solely created from corresponding acoustic features. The overall average error rate on speech segments is 16.32%. The current system disregards (almost) any kind of linguistic information; however, it is therefore easily extensible to new languages.