An acoustic analysis of fluctuations for inter- and intra-speaker variability in speech sounds

Q3 Social Sciences

Journal of Forensic Science and Medicine Pub Date : 2023-01-01 DOI:10.2139/ssrn.3960502

J. Kaur, K. Juglan, Kush Sharma, Vishal Sharma

{"title":"An acoustic analysis of fluctuations for inter- and intra-speaker variability in speech sounds","authors":"J. Kaur, K. Juglan, Kush Sharma, Vishal Sharma","doi":"10.2139/ssrn.3960502","DOIUrl":null,"url":null,"abstract":"Background: Variation in the speech of speakers is a crucial issue for the forensic system. The main reason behind incorrect speaker identification is greater intra-speaker fluctuation. In the forensic state of play, a lot of research has been carried out on speaker identification. However inter variations and intra fluctuations in speakers for the Punjabi language is still a grey area. Aims and Objectives: Our aim is to study acoustic analysis of fluctuations for inter and intra speaker variability in speech sounds. In our study, we will consider Punjabi vowel with consonants. The Statistical methods will be applied to analyze the data; firstly, the Shapiro-Wilk test will be checked for normality and then Levene's Test to assess the equality of variances. Materials and Method: Five vowels were selected with different consonants. They were combined to make meaningful words. Then these meaningful words were embedded in sentences. Ten speakers participated voluntarily. All are students of A.S College at Khanna in Punjab. The individuals were aged between 20-22 years with no hearing or speech disorder. The voice samples were recorded with help of good quality microphone and by Goldwave software in the sound proof lab.Samples were introduced directly into PRAAT software by the use of a Sony microphone and with sampling rate of 44100 Hz frequency. Acoustic Analysis has been done with help of Goldwave software in form of spectrograms. Results and Conclusion: Each formant shows a different value for inter variations and inter speaker fluctuations. F1 and F2 shows lesser speaker variation than the high-frequency region in F3 and F4, so we can say that in comparison with the lower part, high-frequency regions are more valuable. The assumptions for TWO-WAY ANOVA is violated and hence, we have used the non-parametric Friedman Test and performed its Post hoc analysis. From Posthoc analysis, we can say that F1 and F2 (p >0.05) and F2 and F3 (p>0.05) gave the same type of results. Hence, from the results of these statistical tests, we can conclude that F1 is recommended over F2, F3, and F4. As the frequency of F1 is high as well as in line with the results of statistical tests. Because we prefer more variation among frequencies so that we can easily distinguish different speakers and it would be more beneficial for inter variations and intra fluctuations.","PeriodicalId":36434,"journal":{"name":"Journal of Forensic Science and Medicine","volume":"9 1","pages":"38 - 43"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Forensic Science and Medicine","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.2139/ssrn.3960502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Variation in the speech of speakers is a crucial issue for the forensic system. The main reason behind incorrect speaker identification is greater intra-speaker fluctuation. In the forensic state of play, a lot of research has been carried out on speaker identification. However inter variations and intra fluctuations in speakers for the Punjabi language is still a grey area. Aims and Objectives: Our aim is to study acoustic analysis of fluctuations for inter and intra speaker variability in speech sounds. In our study, we will consider Punjabi vowel with consonants. The Statistical methods will be applied to analyze the data; firstly, the Shapiro-Wilk test will be checked for normality and then Levene's Test to assess the equality of variances. Materials and Method: Five vowels were selected with different consonants. They were combined to make meaningful words. Then these meaningful words were embedded in sentences. Ten speakers participated voluntarily. All are students of A.S College at Khanna in Punjab. The individuals were aged between 20-22 years with no hearing or speech disorder. The voice samples were recorded with help of good quality microphone and by Goldwave software in the sound proof lab.Samples were introduced directly into PRAAT software by the use of a Sony microphone and with sampling rate of 44100 Hz frequency. Acoustic Analysis has been done with help of Goldwave software in form of spectrograms. Results and Conclusion: Each formant shows a different value for inter variations and inter speaker fluctuations. F1 and F2 shows lesser speaker variation than the high-frequency region in F3 and F4, so we can say that in comparison with the lower part, high-frequency regions are more valuable. The assumptions for TWO-WAY ANOVA is violated and hence, we have used the non-parametric Friedman Test and performed its Post hoc analysis. From Posthoc analysis, we can say that F1 and F2 (p >0.05) and F2 and F3 (p>0.05) gave the same type of results. Hence, from the results of these statistical tests, we can conclude that F1 is recommended over F2, F3, and F4. As the frequency of F1 is high as well as in line with the results of statistical tests. Because we prefer more variation among frequencies so that we can easily distinguish different speakers and it would be more beneficial for inter variations and intra fluctuations.

查看原文本刊更多论文

语音中说话人间和说话人内变异性波动的声学分析

背景：说话人言语的变异是司法系统的一个关键问题。不正确的说话人识别背后的主要原因是说话人内部波动较大。在法医学的研究中，人们对说话人的识别进行了大量的研究。然而，旁遮普语使用者的相互变异和内部波动仍然是一个灰色地带。目的和目的：我们的目的是研究语音中扬声器间和扬声器内变异性波动的声学分析。在我们的研究中，我们将考虑带有辅音的旁遮普语元音。将采用统计方法对数据进行分析；首先，将检验Shapiro-Wilk检验的正态性，然后检验Levene检验以评估方差的相等性。材料与方法：选取5个不同辅音的元音。它们被组合在一起形成有意义的单词。然后这些有意义的单词被嵌入句子中。10名发言者自愿参加。他们都是旁遮普省卡纳的A.S学院的学生。这些人的年龄在20-22岁之间，没有听力或言语障碍。语音样本是在优质麦克风和隔音实验室的Goldwave软件的帮助下录制的。使用索尼麦克风将样本直接引入PRAT软件，采样率为44100Hz。声学分析是在Goldwave软件的帮助下以声谱图的形式进行的。结果和结论：每个共振峰显示出不同的变异和说话人间波动值。F1和F2显示出比F3和F4中的高频区域更小的扬声器变化，因此我们可以说，与下部相比，高频区域更有价值。违反了双向方差分析的假设，因此，我们使用了非参数弗里德曼检验并进行了事后分析。根据事后分析，我们可以说F1和F2（p>0.05）以及F2和F3（p>0.05）给出了相同类型的结果。因此，根据这些统计测试的结果，我们可以得出结论，F1比F2、F3和F4更受推荐。由于F1的频率很高，并且与统计测试的结果一致。因为我们更喜欢频率之间的变化，这样我们就可以很容易地区分不同的扬声器，这对相互变化和内部波动更有利。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊