{"title":"使用多渠道迁移学习改进基于语音的生物识别技术","authors":"Youssouf Ismail Cherif, Abdelhakim Dahimene","doi":"10.33965/ijcsis_2020150108","DOIUrl":null,"url":null,"abstract":"Identifying the speaker has become more of an imperative thing to do in the modern age. Especially since most personal and professional appliances rely on voice commands or speech in general terms to operate. These systems need to discern the identity of the speaker rather than just the words that have been said to be both smart and safe. Especially if we consider the numerous advanced methods that have been developed to generate fake speech segments. The objective of this paper is to improve upon the existing voice-based biometrics to keep up with these synthesizers. The proposed method focuses on defining a novel and more speaker adapted features by implying artificial neural networks and transfer learning. The approach uses pre-trained networks to define a mapping from two complementary acoustic features to a speaker adapted phonetic features. The complementary acoustics features are paired to provide both information about how the speech segments are perceived (type 1 feature) and produced (type 2 feature). The approach was evaluated using both a small and large closed-speaker data set. Primary results are encouraging and confirm the usefulness of such an approach to extract speaker adapted features whether for classical machine learning algorithms or advanced neural structures such as LSTM or CNN.","PeriodicalId":41878,"journal":{"name":"IADIS-International Journal on Computer Science and Information Systems","volume":"35 1","pages":""},"PeriodicalIF":0.2000,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IMPROVED VOICE-BASED BIOMETRICS USING MULTI-CHANNEL TRANSFER LEARNING\",\"authors\":\"Youssouf Ismail Cherif, Abdelhakim Dahimene\",\"doi\":\"10.33965/ijcsis_2020150108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying the speaker has become more of an imperative thing to do in the modern age. Especially since most personal and professional appliances rely on voice commands or speech in general terms to operate. These systems need to discern the identity of the speaker rather than just the words that have been said to be both smart and safe. Especially if we consider the numerous advanced methods that have been developed to generate fake speech segments. The objective of this paper is to improve upon the existing voice-based biometrics to keep up with these synthesizers. The proposed method focuses on defining a novel and more speaker adapted features by implying artificial neural networks and transfer learning. The approach uses pre-trained networks to define a mapping from two complementary acoustic features to a speaker adapted phonetic features. The complementary acoustics features are paired to provide both information about how the speech segments are perceived (type 1 feature) and produced (type 2 feature). The approach was evaluated using both a small and large closed-speaker data set. Primary results are encouraging and confirm the usefulness of such an approach to extract speaker adapted features whether for classical machine learning algorithms or advanced neural structures such as LSTM or CNN.\",\"PeriodicalId\":41878,\"journal\":{\"name\":\"IADIS-International Journal on Computer Science and Information Systems\",\"volume\":\"35 1\",\"pages\":\"\"},\"PeriodicalIF\":0.2000,\"publicationDate\":\"2020-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IADIS-International Journal on Computer Science and Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33965/ijcsis_2020150108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IADIS-International Journal on Computer Science and Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33965/ijcsis_2020150108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
IMPROVED VOICE-BASED BIOMETRICS USING MULTI-CHANNEL TRANSFER LEARNING
Identifying the speaker has become more of an imperative thing to do in the modern age. Especially since most personal and professional appliances rely on voice commands or speech in general terms to operate. These systems need to discern the identity of the speaker rather than just the words that have been said to be both smart and safe. Especially if we consider the numerous advanced methods that have been developed to generate fake speech segments. The objective of this paper is to improve upon the existing voice-based biometrics to keep up with these synthesizers. The proposed method focuses on defining a novel and more speaker adapted features by implying artificial neural networks and transfer learning. The approach uses pre-trained networks to define a mapping from two complementary acoustic features to a speaker adapted phonetic features. The complementary acoustics features are paired to provide both information about how the speech segments are perceived (type 1 feature) and produced (type 2 feature). The approach was evaluated using both a small and large closed-speaker data set. Primary results are encouraging and confirm the usefulness of such an approach to extract speaker adapted features whether for classical machine learning algorithms or advanced neural structures such as LSTM or CNN.