Ming-jun Jiao, Yue Li, Pengbo Dang, Wei Cao, Lingli Wang
{"title":"A High Performance FPGA-Based Accelerator Design for End-to-End Speaker Recognition System","authors":"Ming-jun Jiao, Yue Li, Pengbo Dang, Wei Cao, Lingli Wang","doi":"10.1109/ICFPT47387.2019.00033","DOIUrl":null,"url":null,"abstract":"Speaker recognition technique is significant for identification applications. X-vectors, a robust text-independent speaker recognition system, spends plenty of time on extracting voiceprint features due to massive neural network computation and scoring with all the people registered in the database to find the best match person. In this paper, an FPGA-based high-performance accelerator for this end-to-end speaker recognition system is proposed, which contains three parts: Mel Frequency Cepstral Coefficients (MFCC), time delay neural network (TDNN) and probabilistic linear discriminant analysis (PLDA) classifier. A quantitative analysis is presented to balance the bit width and the recognition accuracy. In addition, an optimization strategy to make a trade-off between the system parallelism and the FPGA resource utilization is introduced. As a comparison, the proposed accelerator running on Xilinx XCVU9P FPGA of UltraScale+ VCU118 board can achieve a peak performance of 1.067 TOP/s and 1.30×105 voice frames per second (vFPS) with 200MHz, which can obtain 1296× speedup compared with X-vectors software implementation running on a 2.5GHz Intel Xeon E5-2620 processor and 6.42× energy efficiency than Nvidia TITAN Xp GPU solution.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Speaker recognition technique is significant for identification applications. X-vectors, a robust text-independent speaker recognition system, spends plenty of time on extracting voiceprint features due to massive neural network computation and scoring with all the people registered in the database to find the best match person. In this paper, an FPGA-based high-performance accelerator for this end-to-end speaker recognition system is proposed, which contains three parts: Mel Frequency Cepstral Coefficients (MFCC), time delay neural network (TDNN) and probabilistic linear discriminant analysis (PLDA) classifier. A quantitative analysis is presented to balance the bit width and the recognition accuracy. In addition, an optimization strategy to make a trade-off between the system parallelism and the FPGA resource utilization is introduced. As a comparison, the proposed accelerator running on Xilinx XCVU9P FPGA of UltraScale+ VCU118 board can achieve a peak performance of 1.067 TOP/s and 1.30×105 voice frames per second (vFPS) with 200MHz, which can obtain 1296× speedup compared with X-vectors software implementation running on a 2.5GHz Intel Xeon E5-2620 processor and 6.42× energy efficiency than Nvidia TITAN Xp GPU solution.