{"title":"Using articulatory feature detectors in progressive networks for multilingual low-resource phone recognitiona).","authors":"Mahir Morshed, Mark Hasegawa-Johnson","doi":"10.1121/10.0034415","DOIUrl":null,"url":null,"abstract":"<p><p>Systems inspired by progressive neural networks, transferring information from end-to-end articulatory feature detectors to similarly structured phone recognizers, are described. These networks, connecting the corresponding recurrent layers of pre-trained feature detector stacks and newly introduced phone recognizer stacks, were trained on data from four Asian languages, with experiments testing the system on those languages and four African languages. Later adjustments of these networks include the use of contrastive predictive coding layers at the inputs to those networks' recurrent portions. Such adjustments allow for performance differences to be attributed to the presence or absence of individual feature detectors (for consonant place/manner and vowel height/backness). Some of these differences manifest after feature-level comparisons of recognizer outputs, as well as through considering variations and ablations in architecture and training setup. These differences encourage further exploration of methods to reduce errors with phones having specific articulatory features as well as further architectural modifications.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"156 5","pages":"3411-3421"},"PeriodicalIF":2.1000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0034415","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Systems inspired by progressive neural networks, transferring information from end-to-end articulatory feature detectors to similarly structured phone recognizers, are described. These networks, connecting the corresponding recurrent layers of pre-trained feature detector stacks and newly introduced phone recognizer stacks, were trained on data from four Asian languages, with experiments testing the system on those languages and four African languages. Later adjustments of these networks include the use of contrastive predictive coding layers at the inputs to those networks' recurrent portions. Such adjustments allow for performance differences to be attributed to the presence or absence of individual feature detectors (for consonant place/manner and vowel height/backness). Some of these differences manifest after feature-level comparisons of recognizer outputs, as well as through considering variations and ablations in architecture and training setup. These differences encourage further exploration of methods to reduce errors with phones having specific articulatory features as well as further architectural modifications.
期刊介绍:
Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.