{"title":"Open-Set Speaker Identification pipeline in live criminal investigations","authors":"Mael Fabien, P. Motlícek","doi":"10.21437/spsc.2021-5","DOIUrl":null,"url":null,"abstract":"Speaker recognition has many applications in conversational data, including in forensic science where Law Enforcement Agencies (LEAs) aim to assess the identity of a speaker on a specific recorded telephone call. However, speaker identification (SID) systems require initial enrollment data, whereas LEAs might start a case with text or video evidence, and few to no enrollment data. In this paper, we introduce the ROXANNE simulated dataset, a multilingual corpus of acted telephone calls following a screenplay prepared by LEAs. We also present a process to build criminal networks from SID, by addressing practical constraints of these investigations. Our process reaches a speaker accuracy of 92.4% on the simulated data and a conversation accuracy of 84.9%. We finally offer some future directions for this work.","PeriodicalId":185916,"journal":{"name":"2021 ISCA Symposium on Security and Privacy in Speech Communication","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 ISCA Symposium on Security and Privacy in Speech Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/spsc.2021-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Speaker recognition has many applications in conversational data, including in forensic science where Law Enforcement Agencies (LEAs) aim to assess the identity of a speaker on a specific recorded telephone call. However, speaker identification (SID) systems require initial enrollment data, whereas LEAs might start a case with text or video evidence, and few to no enrollment data. In this paper, we introduce the ROXANNE simulated dataset, a multilingual corpus of acted telephone calls following a screenplay prepared by LEAs. We also present a process to build criminal networks from SID, by addressing practical constraints of these investigations. Our process reaches a speaker accuracy of 92.4% on the simulated data and a conversation accuracy of 84.9%. We finally offer some future directions for this work.