Segment unit shuffling layer in deep neural networks for text-independent speaker verification

IF 0.3 Q4 ACOUSTICS

Journal of the Acoustical Society of Korea Pub Date : 2021-03-01 DOI:10.7776/ASK.2021.40.2.148

Ju-Sung Heo, Hye-jin Shim, Ju-ho Kim, Ha-jin Yu

{"title":"Segment unit shuffling layer in deep neural networks for text-independent speaker verification","authors":"Ju-Sung Heo, Hye-jin Shim, Ju-ho Kim, Ha-jin Yu","doi":"10.7776/ASK.2021.40.2.148","DOIUrl":null,"url":null,"abstract":"Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance of text-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.","PeriodicalId":42689,"journal":{"name":"Journal of the Acoustical Society of Korea","volume":"40 1","pages":"148-154"},"PeriodicalIF":0.3000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of Korea","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7776/ASK.2021.40.2.148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance of text-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.

查看原文本刊更多论文

深度神经网络中用于文本无关说话人验证的分段单元混洗层

文本无关说话人验证需要提取文本无关说话人嵌入，以提高泛化性能。然而，当从相同的时间序列中重复学习时，依赖于训练数据的深度神经网络有可能过度拟合文本信息，而不是学习说话者信息。在本文中，为了防止过拟合，我们提出了一个分段单元混洗层，它沿着时间轴划分和重新排列输入层或隐藏层，从而混合时间序列信息。由于分段单元混洗层不仅可以应用于输入层，还可以应用于隐藏层，因此它可以用作隐藏层中的泛化技术，与输入层中的推广技术相比，它是有效的，并且可以与数据扩充同时应用。此外，可以通过调整片段的单位大小来调整失真程度。我们观察到，当应用所提出的分段单元混洗层时，与基线相比，与文本无关的说话人验证的性能有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the Acoustical Society of Korea ACOUSTICS-

CiteScore

0.60

自引率

50.00%

发文量