Antonio Artur Moura , Napoleão Nepomuceno , Vasco Furtado
{"title":"通过聚类和基于等级的评分加强刑事调查中的说话者识别","authors":"Antonio Artur Moura , Napoleão Nepomuceno , Vasco Furtado","doi":"10.1016/j.fsidi.2024.301765","DOIUrl":null,"url":null,"abstract":"<div><p>This paper introduces an approach that supports speaker identification in criminal investigations, specifically addressing challenges associated with large volumes of audio recordings featuring unknown speaker identities. Our approach clusters related recordings – potentially from the same person – based on representative voice embeddings extracted using the ECAPA-TDNN speaker recognition model. Grouping audio recordings from the same person enhances variability and richness in voice patterns, thereby improving confidence in automatic speaker recognition. We propose a combination of cosine similarity and a rank-based adjustment function to determine matches of audio clusters with individuals in an enrollment database. Our approach was validated through experiments on a Common Voice-based synthesized dataset and a real-life application involving cell phones seized in prisons, which contained thousands of conversational audio recordings. Results demonstrated satisfactory performance and stability, consistently reducing the pool of candidate speakers for subsequent analysis by a human investigator.</p></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666281724000842/pdfft?md5=5c54ecf083c31c2d3dfc285faf7d7b1c&pid=1-s2.0-S2666281724000842-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Enhancing speaker identification in criminal investigations through clusterization and rank-based scoring\",\"authors\":\"Antonio Artur Moura , Napoleão Nepomuceno , Vasco Furtado\",\"doi\":\"10.1016/j.fsidi.2024.301765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper introduces an approach that supports speaker identification in criminal investigations, specifically addressing challenges associated with large volumes of audio recordings featuring unknown speaker identities. Our approach clusters related recordings – potentially from the same person – based on representative voice embeddings extracted using the ECAPA-TDNN speaker recognition model. Grouping audio recordings from the same person enhances variability and richness in voice patterns, thereby improving confidence in automatic speaker recognition. We propose a combination of cosine similarity and a rank-based adjustment function to determine matches of audio clusters with individuals in an enrollment database. Our approach was validated through experiments on a Common Voice-based synthesized dataset and a real-life application involving cell phones seized in prisons, which contained thousands of conversational audio recordings. Results demonstrated satisfactory performance and stability, consistently reducing the pool of candidate speakers for subsequent analysis by a human investigator.</p></div>\",\"PeriodicalId\":48481,\"journal\":{\"name\":\"Forensic Science International-Digital Investigation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666281724000842/pdfft?md5=5c54ecf083c31c2d3dfc285faf7d7b1c&pid=1-s2.0-S2666281724000842-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Digital Investigation\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666281724000842\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281724000842","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Enhancing speaker identification in criminal investigations through clusterization and rank-based scoring
This paper introduces an approach that supports speaker identification in criminal investigations, specifically addressing challenges associated with large volumes of audio recordings featuring unknown speaker identities. Our approach clusters related recordings – potentially from the same person – based on representative voice embeddings extracted using the ECAPA-TDNN speaker recognition model. Grouping audio recordings from the same person enhances variability and richness in voice patterns, thereby improving confidence in automatic speaker recognition. We propose a combination of cosine similarity and a rank-based adjustment function to determine matches of audio clusters with individuals in an enrollment database. Our approach was validated through experiments on a Common Voice-based synthesized dataset and a real-life application involving cell phones seized in prisons, which contained thousands of conversational audio recordings. Results demonstrated satisfactory performance and stability, consistently reducing the pool of candidate speakers for subsequent analysis by a human investigator.