{"title":"现场刑事调查中的开式扬声器识别管道","authors":"Mael Fabien, P. Motlícek","doi":"10.21437/spsc.2021-5","DOIUrl":null,"url":null,"abstract":"Speaker recognition has many applications in conversational data, including in forensic science where Law Enforcement Agencies (LEAs) aim to assess the identity of a speaker on a specific recorded telephone call. However, speaker identification (SID) systems require initial enrollment data, whereas LEAs might start a case with text or video evidence, and few to no enrollment data. In this paper, we introduce the ROXANNE simulated dataset, a multilingual corpus of acted telephone calls following a screenplay prepared by LEAs. We also present a process to build criminal networks from SID, by addressing practical constraints of these investigations. Our process reaches a speaker accuracy of 92.4% on the simulated data and a conversation accuracy of 84.9%. We finally offer some future directions for this work.","PeriodicalId":185916,"journal":{"name":"2021 ISCA Symposium on Security and Privacy in Speech Communication","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Open-Set Speaker Identification pipeline in live criminal investigations\",\"authors\":\"Mael Fabien, P. Motlícek\",\"doi\":\"10.21437/spsc.2021-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speaker recognition has many applications in conversational data, including in forensic science where Law Enforcement Agencies (LEAs) aim to assess the identity of a speaker on a specific recorded telephone call. However, speaker identification (SID) systems require initial enrollment data, whereas LEAs might start a case with text or video evidence, and few to no enrollment data. In this paper, we introduce the ROXANNE simulated dataset, a multilingual corpus of acted telephone calls following a screenplay prepared by LEAs. We also present a process to build criminal networks from SID, by addressing practical constraints of these investigations. Our process reaches a speaker accuracy of 92.4% on the simulated data and a conversation accuracy of 84.9%. We finally offer some future directions for this work.\",\"PeriodicalId\":185916,\"journal\":{\"name\":\"2021 ISCA Symposium on Security and Privacy in Speech Communication\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 ISCA Symposium on Security and Privacy in Speech Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/spsc.2021-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 ISCA Symposium on Security and Privacy in Speech Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/spsc.2021-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Open-Set Speaker Identification pipeline in live criminal investigations
Speaker recognition has many applications in conversational data, including in forensic science where Law Enforcement Agencies (LEAs) aim to assess the identity of a speaker on a specific recorded telephone call. However, speaker identification (SID) systems require initial enrollment data, whereas LEAs might start a case with text or video evidence, and few to no enrollment data. In this paper, we introduce the ROXANNE simulated dataset, a multilingual corpus of acted telephone calls following a screenplay prepared by LEAs. We also present a process to build criminal networks from SID, by addressing practical constraints of these investigations. Our process reaches a speaker accuracy of 92.4% on the simulated data and a conversation accuracy of 84.9%. We finally offer some future directions for this work.