Fusing linguistic and acoustic information for automated forensic speaker comparison

IF 1.9 4区 医学 Q2 MEDICINE, LEGAL
E.K. Sergidou , Rolf Ypma , Johan Rohdin , Marcel Worring , Zeno Geradts , Wauter Bosma
{"title":"Fusing linguistic and acoustic information for automated forensic speaker comparison","authors":"E.K. Sergidou ,&nbsp;Rolf Ypma ,&nbsp;Johan Rohdin ,&nbsp;Marcel Worring ,&nbsp;Zeno Geradts ,&nbsp;Wauter Bosma","doi":"10.1016/j.scijus.2024.07.001","DOIUrl":null,"url":null,"abstract":"<div><p>Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (<span><math><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>llr</mi></mrow></msub></mrow></math></span>) and equal error rate (<em>EER</em>). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.</p></div>","PeriodicalId":49565,"journal":{"name":"Science & Justice","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science & Justice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S135503062400056X","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0

Abstract

Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (Cllr) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.

融合语言和声学信息,实现自动法证说话人对比
核实语音片段的说话人对于将犯罪归咎于嫌疑人至关重要。这个问题可以通过有争议的和参考的语音材料来解决,采用推荐的、科学上公认的似然比框架来报告法庭上的证据力。在法医实践中,通常会进行听觉和声学分析,以执行此类验证任务,并考虑多种特征,如语言能力、发音或其他语言特征。在进行人工分析的同时,还可以使用自动说话者比对系统。最先进的自动说话人对比系统基于深度神经网络,将声学特征作为输入。不过,还可以从语言分析中获取更多信息。在本文中,我们旨在回答在似然比框架内,基于频词的作者身份技术是否、何时以及如何对基于声学的现代系统进行补充。我们考虑了三种不同的方法来得出综合似然比:使用支持向量机算法、拟合二元正态分布以及将声学系统的得分作为频词分析的附加输入。我们将我们的方法应用于法医相关数据集 FRIDA 和 FISHER 语料库,并探讨了在哪些条件下融合是有价值的。我们以对数似然比成本(Cllr)和等差错率(EER)来评估我们的结果。我们的结果表明,融合是有益的,尤其是在截获电话的背景噪声情况下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Science & Justice
Science & Justice 医学-病理学
CiteScore
4.20
自引率
15.80%
发文量
98
审稿时长
81 days
期刊介绍: Science & Justice provides a forum to promote communication and publication of original articles, reviews and correspondence on subjects that spark debates within the Forensic Science Community and the criminal justice sector. The journal provides a medium whereby all aspects of applying science to legal proceedings can be debated and progressed. Science & Justice is published six times a year, and will be of interest primarily to practising forensic scientists and their colleagues in related fields. It is chiefly concerned with the publication of formal scientific papers, in keeping with its international learned status, but will not accept any article describing experimentation on animals which does not meet strict ethical standards. Promote communication and informed debate within the Forensic Science Community and the criminal justice sector. To promote the publication of learned and original research findings from all areas of the forensic sciences and by so doing to advance the profession. To promote the publication of case based material by way of case reviews. To promote the publication of conference proceedings which are of interest to the forensic science community. To provide a medium whereby all aspects of applying science to legal proceedings can be debated and progressed. To appeal to all those with an interest in the forensic sciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信