{"title":"On the Detection of Pitch-Shifted Voice: Machines and Human Listeners","authors":"D. Looney, N. Gaubitch","doi":"10.1109/ICASSP39728.2021.9414890","DOIUrl":null,"url":null,"abstract":"We present a performance comparison between human listeners and a simple algorithm for the task of speech anomaly detection. The algorithm utilises an intentionally small set of features derived from the source-filter model, with the aim of validating that key components of source-filter theory characterise how humans perceive anomalies. We furthermore recognise that humans are adept at detecting anomalies without prior exposure to a given anomaly class. To that end, we also consider the algorithm performance when operating via the principle of unsupervised learning where a null model is derived from normal speech recordings. We evaluate both the algorithm and human listeners for pitch-shift detection where the pitch of a speech sample is intentionally modified using software, a phenomenon of relevance to the fields of fraud detection and forensics. Our results show that humans can only detect pitch-shift reliably at more extreme levels, and that the performance of the algorithm matches closely with that of humans.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9414890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present a performance comparison between human listeners and a simple algorithm for the task of speech anomaly detection. The algorithm utilises an intentionally small set of features derived from the source-filter model, with the aim of validating that key components of source-filter theory characterise how humans perceive anomalies. We furthermore recognise that humans are adept at detecting anomalies without prior exposure to a given anomaly class. To that end, we also consider the algorithm performance when operating via the principle of unsupervised learning where a null model is derived from normal speech recordings. We evaluate both the algorithm and human listeners for pitch-shift detection where the pitch of a speech sample is intentionally modified using software, a phenomenon of relevance to the fields of fraud detection and forensics. Our results show that humans can only detect pitch-shift reliably at more extreme levels, and that the performance of the algorithm matches closely with that of humans.