融合语言和声学信息，实现自动法证说话人对比

IF 1.9 4区医学 Q2 MEDICINE, LEGAL

Science & Justice Pub Date : 2024-07-09 DOI:10.1016/j.scijus.2024.07.001

E.K. Sergidou , Rolf Ypma , Johan Rohdin , Marcel Worring , Zeno Geradts , Wauter Bosma

{"title":"融合语言和声学信息，实现自动法证说话人对比","authors":"E.K. Sergidou , Rolf Ypma , Johan Rohdin , Marcel Worring , Zeno Geradts , Wauter Bosma","doi":"10.1016/j.scijus.2024.07.001","DOIUrl":null,"url":null,"abstract":"<div><p>Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (<span><math><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>llr</mi></mrow></msub></mrow></math></span>) and equal error rate (<em>EER</em>). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.</p></div>","PeriodicalId":49565,"journal":{"name":"Science & Justice","volume":"64 5","pages":"Pages 485-497"},"PeriodicalIF":1.9000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fusing linguistic and acoustic information for automated forensic speaker comparison\",\"authors\":\"E.K. Sergidou , Rolf Ypma , Johan Rohdin , Marcel Worring , Zeno Geradts , Wauter Bosma\",\"doi\":\"10.1016/j.scijus.2024.07.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (<span><math><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>llr</mi></mrow></msub></mrow></math></span>) and equal error rate (<em>EER</em>). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.</p></div>\",\"PeriodicalId\":49565,\"journal\":{\"name\":\"Science & Justice\",\"volume\":\"64 5\",\"pages\":\"Pages 485-497\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science & Justice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S135503062400056X\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, LEGAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science & Justice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S135503062400056X","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}

引用次数: 0

摘要

核实语音片段的说话人对于将犯罪归咎于嫌疑人至关重要。这个问题可以通过有争议的和参考的语音材料来解决，采用推荐的、科学上公认的似然比框架来报告法庭上的证据力。在法医实践中，通常会进行听觉和声学分析，以执行此类验证任务，并考虑多种特征，如语言能力、发音或其他语言特征。在进行人工分析的同时，还可以使用自动说话者比对系统。最先进的自动说话人对比系统基于深度神经网络，将声学特征作为输入。不过，还可以从语言分析中获取更多信息。在本文中，我们旨在回答在似然比框架内，基于频词的作者身份技术是否、何时以及如何对基于声学的现代系统进行补充。我们考虑了三种不同的方法来得出综合似然比：使用支持向量机算法、拟合二元正态分布以及将声学系统的得分作为频词分析的附加输入。我们将我们的方法应用于法医相关数据集 FRIDA 和 FISHER 语料库，并探讨了在哪些条件下融合是有价值的。我们以对数似然比成本（Cllr）和等差错率（EER）来评估我们的结果。我们的结果表明，融合是有益的，尤其是在截获电话的背景噪声情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fusing linguistic and acoustic information for automated forensic speaker comparison

Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost ( $C_{llr}$ ) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Science & Justice 医学-病理学

CiteScore

4.20

自引率

15.80%

发文量

审稿时长

81 days

期刊介绍： Science & Justice provides a forum to promote communication and publication of original articles, reviews and correspondence on subjects that spark debates within the Forensic Science Community and the criminal justice sector. The journal provides a medium whereby all aspects of applying science to legal proceedings can be debated and progressed. Science & Justice is published six times a year, and will be of interest primarily to practising forensic scientists and their colleagues in related fields. It is chiefly concerned with the publication of formal scientific papers, in keeping with its international learned status, but will not accept any article describing experimentation on animals which does not meet strict ethical standards. Promote communication and informed debate within the Forensic Science Community and the criminal justice sector. To promote the publication of learned and original research findings from all areas of the forensic sciences and by so doing to advance the profession. To promote the publication of case based material by way of case reviews. To promote the publication of conference proceedings which are of interest to the forensic science community. To provide a medium whereby all aspects of applying science to legal proceedings can be debated and progressed. To appeal to all those with an interest in the forensic sciences.