E.K. Sergidou , Rolf Ypma , Johan Rohdin , Marcel Worring , Zeno Geradts , Wauter Bosma
{"title":"Fusing linguistic and acoustic information for automated forensic speaker comparison","authors":"E.K. Sergidou , Rolf Ypma , Johan Rohdin , Marcel Worring , Zeno Geradts , Wauter Bosma","doi":"10.1016/j.scijus.2024.07.001","DOIUrl":null,"url":null,"abstract":"<div><p>Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (<span><math><mrow><msub><mrow><mi>C</mi></mrow><mrow><mi>llr</mi></mrow></msub></mrow></math></span>) and equal error rate (<em>EER</em>). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.</p></div>","PeriodicalId":49565,"journal":{"name":"Science & Justice","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science & Justice","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S135503062400056X","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0
Abstract
Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost () and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.
期刊介绍:
Science & Justice provides a forum to promote communication and publication of original articles, reviews and correspondence on subjects that spark debates within the Forensic Science Community and the criminal justice sector. The journal provides a medium whereby all aspects of applying science to legal proceedings can be debated and progressed. Science & Justice is published six times a year, and will be of interest primarily to practising forensic scientists and their colleagues in related fields. It is chiefly concerned with the publication of formal scientific papers, in keeping with its international learned status, but will not accept any article describing experimentation on animals which does not meet strict ethical standards.
Promote communication and informed debate within the Forensic Science Community and the criminal justice sector.
To promote the publication of learned and original research findings from all areas of the forensic sciences and by so doing to advance the profession.
To promote the publication of case based material by way of case reviews.
To promote the publication of conference proceedings which are of interest to the forensic science community.
To provide a medium whereby all aspects of applying science to legal proceedings can be debated and progressed.
To appeal to all those with an interest in the forensic sciences.