{"title":"Unsupervised phone segmentation method using delta spectral function","authors":"Dac-Thang Hoang, Hsiao-Chuan Wang","doi":"10.1109/ICSDA.2011.6085998","DOIUrl":null,"url":null,"abstract":"Unsupervised phone segmentation means that the phone boundaries in an utterance can be detected without a prior knowledge about the text contents. Usually, a spectral change in the speech signal implies the existence of a phone boundary. In this paper, the Delta Spectral Function (DSF) is defined for each frame to represent the variation of band energy for a specific band. Then a number of bands that give highest DSF values in a frame are chosen to define a measure of spectral change. The chosen bands are not fixed. They are dynamically chosen frame by frame. The peaks of the spectral change curve can be recognized as possible boundaries. A fine tune procedure is then applied to choose the peaks that will be the detected boundaries. Our proposed method results in an F-value of 75.3% under the condition of near zero over segmentation. In this situation the recall rate is 75.3%. This experimental result is better than many previous reports. Besides, the computation is simple and the proposed method is easy to be implemented.","PeriodicalId":269402,"journal":{"name":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Speech Database and Assessments (Oriental COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2011.6085998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Unsupervised phone segmentation means that the phone boundaries in an utterance can be detected without a prior knowledge about the text contents. Usually, a spectral change in the speech signal implies the existence of a phone boundary. In this paper, the Delta Spectral Function (DSF) is defined for each frame to represent the variation of band energy for a specific band. Then a number of bands that give highest DSF values in a frame are chosen to define a measure of spectral change. The chosen bands are not fixed. They are dynamically chosen frame by frame. The peaks of the spectral change curve can be recognized as possible boundaries. A fine tune procedure is then applied to choose the peaks that will be the detected boundaries. Our proposed method results in an F-value of 75.3% under the condition of near zero over segmentation. In this situation the recall rate is 75.3%. This experimental result is better than many previous reports. Besides, the computation is simple and the proposed method is easy to be implemented.