{"title":"边缘语音AI中用于紧凑语言资源表示的有限状态超转换器","authors":"S. Dobrišek, Ziga Golob, Jerneja Žganec Gros","doi":"10.1080/21642583.2022.2089930","DOIUrl":null,"url":null,"abstract":"Finite-state transducers have been proven to yield compact representations of pronunciation dictionaries used for grapheme-to-phoneme conversion in speech engines running on low-resource embedded platforms. However, for highly inflected languages even more efficient language resource reduction methods are needed. In the paper, we demonstrate that the size of finite-state transducers tends to decrease when the number of word forms in the modelled pronunciation dictionary reaches a certain threshold. Motivated by this finding, we propose and evaluate a new type of finite-state transducers, called ‘finite-state super transducers’, which allow for the representation of pronunciation dictionaries by a smaller number of states and transitions, thereby significantly reducing the size of the language resource representation in comparison to minimal deterministic final-state transducers by up to 25%. Further, we demonstrate that finite-state super transducers exhibit a generalization capability as they may accept and thereby phonetically transform even inflected word forms that had not been initially represented in the original pronunciation dictionary used for building the finite-state super transducer. This method is suitable for speech engines operating on platforms at the edge of an AI system with restricted memory capabilities and processing power, where efficient speech processing methods based on compact language resources must be implemented.","PeriodicalId":46282,"journal":{"name":"Systems Science & Control Engineering","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Finite-state super transducers for compact language resource representation in edge voice-AI\",\"authors\":\"S. Dobrišek, Ziga Golob, Jerneja Žganec Gros\",\"doi\":\"10.1080/21642583.2022.2089930\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finite-state transducers have been proven to yield compact representations of pronunciation dictionaries used for grapheme-to-phoneme conversion in speech engines running on low-resource embedded platforms. However, for highly inflected languages even more efficient language resource reduction methods are needed. In the paper, we demonstrate that the size of finite-state transducers tends to decrease when the number of word forms in the modelled pronunciation dictionary reaches a certain threshold. Motivated by this finding, we propose and evaluate a new type of finite-state transducers, called ‘finite-state super transducers’, which allow for the representation of pronunciation dictionaries by a smaller number of states and transitions, thereby significantly reducing the size of the language resource representation in comparison to minimal deterministic final-state transducers by up to 25%. Further, we demonstrate that finite-state super transducers exhibit a generalization capability as they may accept and thereby phonetically transform even inflected word forms that had not been initially represented in the original pronunciation dictionary used for building the finite-state super transducer. This method is suitable for speech engines operating on platforms at the edge of an AI system with restricted memory capabilities and processing power, where efficient speech processing methods based on compact language resources must be implemented.\",\"PeriodicalId\":46282,\"journal\":{\"name\":\"Systems Science & Control Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2022-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systems Science & Control Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/21642583.2022.2089930\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems Science & Control Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/21642583.2022.2089930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Finite-state super transducers for compact language resource representation in edge voice-AI
Finite-state transducers have been proven to yield compact representations of pronunciation dictionaries used for grapheme-to-phoneme conversion in speech engines running on low-resource embedded platforms. However, for highly inflected languages even more efficient language resource reduction methods are needed. In the paper, we demonstrate that the size of finite-state transducers tends to decrease when the number of word forms in the modelled pronunciation dictionary reaches a certain threshold. Motivated by this finding, we propose and evaluate a new type of finite-state transducers, called ‘finite-state super transducers’, which allow for the representation of pronunciation dictionaries by a smaller number of states and transitions, thereby significantly reducing the size of the language resource representation in comparison to minimal deterministic final-state transducers by up to 25%. Further, we demonstrate that finite-state super transducers exhibit a generalization capability as they may accept and thereby phonetically transform even inflected word forms that had not been initially represented in the original pronunciation dictionary used for building the finite-state super transducer. This method is suitable for speech engines operating on platforms at the edge of an AI system with restricted memory capabilities and processing power, where efficient speech processing methods based on compact language resources must be implemented.
期刊介绍:
Systems Science & Control Engineering is a world-leading fully open access journal covering all areas of theoretical and applied systems science and control engineering. The journal encourages the submission of original articles, reviews and short communications in areas including, but not limited to: · artificial intelligence · complex systems · complex networks · control theory · control applications · cybernetics · dynamical systems theory · operations research · systems biology · systems dynamics · systems ecology · systems engineering · systems psychology · systems theory