Glottal fry and voice disguise: a case study in forensic phonetics

Journal of biomedical engineering Pub Date : 1993-05-01 DOI:10.1016/0141-5425(93)90115-F

A. Hirson , M. Duckworth

{"title":"Glottal fry and voice disguise: a case study in forensic phonetics","authors":"A. Hirson , M. Duckworth","doi":"10.1016/0141-5425(93)90115-F","DOIUrl":null,"url":null,"abstract":"<div><p>In recent legal proceedings, forensic phoneticians were called upon to analyse a tape-recorded message intended for the blackmail of a bank manager following the kidnap of his wife. The brief was to establish the likelihood that the tape recording may have been made by any one of three suspects, samples of whose speech were also made available. The comparison was greatly complicated by voice disguise employed by the speaker who recorded the kidnap tape. This disguise comprised a form of phonation described phonetically as ‘glottal fry’ or vocal ‘creak’. This form of phonation occurs normally in normal speech, but it has received most attention in relation to voice pathologies. On the other hand there are few references to its use as a form of voice disguise. This paper discusses the nature of the creak, and examines its effectiveness as voice disguise. In addition, a method is described for speaker identification regardless of the disguise. Results indicate that trained listeners without repeated presentations or instrumentation are able to match speakers with 65% accuracy when one voice is creaky, compared with 90% accuracy for undisguised voices. Using a Euclidean metric to compare the power spectra of the [s] sound, we find that creaky disguised voices may be correctly matched with the undisguised voice of the same speaker (9 distracters) in 5 cases out of 10. However, when the computer's task is made more similar to the perceptual task, selecting one speaker out of two, it achieves an accuracy of 81%. Implications for forensic phonetics are discussed.</p></div>","PeriodicalId":75992,"journal":{"name":"Journal of biomedical engineering","volume":"15 3","pages":"Pages 193-200"},"PeriodicalIF":0.0000,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0141-5425(93)90115-F","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of biomedical engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/014154259390115F","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

Abstract

In recent legal proceedings, forensic phoneticians were called upon to analyse a tape-recorded message intended for the blackmail of a bank manager following the kidnap of his wife. The brief was to establish the likelihood that the tape recording may have been made by any one of three suspects, samples of whose speech were also made available. The comparison was greatly complicated by voice disguise employed by the speaker who recorded the kidnap tape. This disguise comprised a form of phonation described phonetically as ‘glottal fry’ or vocal ‘creak’. This form of phonation occurs normally in normal speech, but it has received most attention in relation to voice pathologies. On the other hand there are few references to its use as a form of voice disguise. This paper discusses the nature of the creak, and examines its effectiveness as voice disguise. In addition, a method is described for speaker identification regardless of the disguise. Results indicate that trained listeners without repeated presentations or instrumentation are able to match speakers with 65% accuracy when one voice is creaky, compared with 90% accuracy for undisguised voices. Using a Euclidean metric to compare the power spectra of the [s] sound, we find that creaky disguised voices may be correctly matched with the undisguised voice of the same speaker (9 distracters) in 5 cases out of 10. However, when the computer's task is made more similar to the perceptual task, selecting one speaker out of two, it achieves an accuracy of 81%. Implications for forensic phonetics are discussed.

查看原文本刊更多论文

声门fry和声音伪装:法医语音学的案例研究

在最近的法律诉讼中，法庭语音学家被要求分析一段录音信息，该录音信息是为了在一名银行经理的妻子被绑架后敲诈他。简报的目的是确定录音可能是三名嫌疑人中的任何一人录制的，他们的讲话样本也已提供。由于录制绑架录像带的说话者使用了声音伪装，这种比较变得非常复杂。这种伪装包括一种发音形式，在语音学上被描述为“声门fry”或声音“吱嘎”。这种形式的发声通常发生在正常的言语中，但它在与语音病理有关的问题上受到了最多的关注。另一方面，很少有人提到它作为一种声音伪装的形式。本文讨论了裂纹的性质，并检验了其作为语音伪装的有效性。此外，描述了一种无论伪装如何识别说话人的方法。结果表明，经过训练的听众在没有重复演示或仪器的情况下，当一个声音吱吱作响时，能够以65%的准确率匹配说话者，而没有伪装的声音的准确率为90%。使用欧几里得度量来比较[s]声音的功率谱，我们发现，在10种情况下，有5种情况下，伪装的声音可能与同一说话者(9个干扰物)的未伪装声音正确匹配。然而，当计算机的任务与感知任务更相似时，从两个说话者中选择一个，它的准确率达到81%。讨论了法医学语音学的意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of biomedical engineering

自引率

0.00%

发文量