Krsto Prorokovic, Michael Wand, Tanja Schultz, J. Schmidhuber
{"title":"基于元学习的基于肌电图的语音识别器自适应","authors":"Krsto Prorokovic, Michael Wand, Tanja Schultz, J. Schmidhuber","doi":"10.1109/GlobalSIP45357.2019.8969231","DOIUrl":null,"url":null,"abstract":"In nonacoustic speech recognition based on electromyography, i.e. on electrical muscle activity captured by noninvasive surface electrodes, differences between recording sessions are known to cause deteriorating system accuracy. Efficient adaptation of an existing system to an unseen recording session is therefore imperative for practical usage scenarios. We report on a meta-learning approach to pretrain a deep neural network frontend for a myoelectric speech recognizer in a way that it can be easily adapted to a new session. Fine-tuning this specially pretrained network yields lower Word Error Rates and higher frame accuracies than fine-tuning a conventionally pretrained network, without creating an increased computational burden on a possibly mobile device.","PeriodicalId":221378,"journal":{"name":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Adaptation of an EMG-Based Speech Recognizer via Meta-Learning\",\"authors\":\"Krsto Prorokovic, Michael Wand, Tanja Schultz, J. Schmidhuber\",\"doi\":\"10.1109/GlobalSIP45357.2019.8969231\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In nonacoustic speech recognition based on electromyography, i.e. on electrical muscle activity captured by noninvasive surface electrodes, differences between recording sessions are known to cause deteriorating system accuracy. Efficient adaptation of an existing system to an unseen recording session is therefore imperative for practical usage scenarios. We report on a meta-learning approach to pretrain a deep neural network frontend for a myoelectric speech recognizer in a way that it can be easily adapted to a new session. Fine-tuning this specially pretrained network yields lower Word Error Rates and higher frame accuracies than fine-tuning a conventionally pretrained network, without creating an increased computational burden on a possibly mobile device.\",\"PeriodicalId\":221378,\"journal\":{\"name\":\"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"volume\":\"202 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GlobalSIP45357.2019.8969231\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP45357.2019.8969231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptation of an EMG-Based Speech Recognizer via Meta-Learning
In nonacoustic speech recognition based on electromyography, i.e. on electrical muscle activity captured by noninvasive surface electrodes, differences between recording sessions are known to cause deteriorating system accuracy. Efficient adaptation of an existing system to an unseen recording session is therefore imperative for practical usage scenarios. We report on a meta-learning approach to pretrain a deep neural network frontend for a myoelectric speech recognizer in a way that it can be easily adapted to a new session. Fine-tuning this specially pretrained network yields lower Word Error Rates and higher frame accuracies than fine-tuning a conventionally pretrained network, without creating an increased computational burden on a possibly mobile device.