{"title":"Speaker-independent voiced-stop-consonant recognition using a block-windowed neural network architecture","authors":"B.D. Bryant, J. Gowdy","doi":"10.1109/SSST.1993.522811","DOIUrl":null,"url":null,"abstract":"The authors study several of the more well-known connectionist models, and how they address the time and frequency variability of the multispeaker, voiced-stop-consonant recognition task. Among the network architectures reviewed or tested for were the self-organizing feature maps (SOFM) architecture, various derivatives of this architecture, the time-delay neural network (TDNN) architecture, various derivatives of this architecture, and two frequency-and-time-shift-invariant architectures, frequency-shift-invariant TDNN, and the block-windowed neural network (FTDNN and BWNN). Voiced-stop speech was extracted from up to four dialect regions of the TIMIT continuous speech corpus for subsequent preprocessing and training and testing of network instances. Various feature representations were tested for their robustness in representing the voiced-stop consonants.","PeriodicalId":260036,"journal":{"name":"1993 (25th) Southeastern Symposium on System Theory","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1993 (25th) Southeastern Symposium on System Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSST.1993.522811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The authors study several of the more well-known connectionist models, and how they address the time and frequency variability of the multispeaker, voiced-stop-consonant recognition task. Among the network architectures reviewed or tested for were the self-organizing feature maps (SOFM) architecture, various derivatives of this architecture, the time-delay neural network (TDNN) architecture, various derivatives of this architecture, and two frequency-and-time-shift-invariant architectures, frequency-shift-invariant TDNN, and the block-windowed neural network (FTDNN and BWNN). Voiced-stop speech was extracted from up to four dialect regions of the TIMIT continuous speech corpus for subsequent preprocessing and training and testing of network instances. Various feature representations were tested for their robustness in representing the voiced-stop consonants.