{"title":"Blind Phone Segmentation Using Contrast Function","authors":"Dac-Thang Hoang, Van-Thuy Mai, Tung-Lam Phi","doi":"10.1109/O-COCOSDA50338.2020.9295035","DOIUrl":null,"url":null,"abstract":"Phone segmentation is a process of detecting the boundaries between phones in a spoken utterance. In this paper, the phone boundaries are detected without knowing contents of speech. Contrast, a concept in image processing, is investigated for phone segmentation. The speech signal is first transformed into frequency domain. Then band energy is extracted and considered as luminance in an image. A contrast function of a frequency band is defined on band energy. The peaks on the curve of contrast function present phone boundaries. The boundaries detected by eight bands are combined using probability mass function. Experiment is conducted on TIMIT corpus and results are promising. This method is also conducted on Vietnamese corpus yielding good results.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Phone segmentation is a process of detecting the boundaries between phones in a spoken utterance. In this paper, the phone boundaries are detected without knowing contents of speech. Contrast, a concept in image processing, is investigated for phone segmentation. The speech signal is first transformed into frequency domain. Then band energy is extracted and considered as luminance in an image. A contrast function of a frequency band is defined on band energy. The peaks on the curve of contrast function present phone boundaries. The boundaries detected by eight bands are combined using probability mass function. Experiment is conducted on TIMIT corpus and results are promising. This method is also conducted on Vietnamese corpus yielding good results.