S.G. Kruthika , Trisiladevi C. Nagavi , P. Mahesha , H.T. Chethana , Vinayakumar Ravi , Alanoud Al Mazroa
{"title":"Semi-automatic approach utilizing Siamese Neural Network for forensic voice comparison","authors":"S.G. Kruthika , Trisiladevi C. Nagavi , P. Mahesha , H.T. Chethana , Vinayakumar Ravi , Alanoud Al Mazroa","doi":"10.1016/j.fraope.2026.100527","DOIUrl":null,"url":null,"abstract":"<div><div>Forensic Voice Comparison (FVC) remains a critical yet challenging task in digital forensics, often hindered by manual subjectivity, background noise, and speaker variability. This paper presents a novel semi-automatic FVC framework based on Siamese Neural Networks (SNN), a discriminative metric-learning architecture combined with stationary noise reduction for robust voice similarity assessment. Proposed framework leverages the SNN’s ability to learn a shared embedding space where Euclidean distance reflects speaker identity. Using a jurisdiction-specific dataset of 3899 Australian English speech samples (FLAC format), proposed framework achieves 96.02% accuracy, 94.00% precision, and 92.10% recall in distinguishing same vs different speaker pairs. The proposed framework is evaluated against strong baselines including Convolutional Neural Networks (CNN), Bidirectional Long Short-Term Memory (BiLSTM), Gaussian Mixture Model-Universal Background Model (GMM-UBM), and validated via 5-fold cross-validation (mean ± std. dev.) to ensure statistical robustness. Proposed framework fills a critical gap in forensic phonetics by demonstrating that lightweight, interpretable, pairwise deep learning models can outperform complex generative or ensemble systems in real-world FVC scenarios. All preprocessing, training protocols, and hyperparameters are documented for reproducibility.</div></div>","PeriodicalId":100554,"journal":{"name":"Franklin Open","volume":"14 ","pages":"Article 100527"},"PeriodicalIF":0.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Franklin Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773186326000435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Forensic Voice Comparison (FVC) remains a critical yet challenging task in digital forensics, often hindered by manual subjectivity, background noise, and speaker variability. This paper presents a novel semi-automatic FVC framework based on Siamese Neural Networks (SNN), a discriminative metric-learning architecture combined with stationary noise reduction for robust voice similarity assessment. Proposed framework leverages the SNN’s ability to learn a shared embedding space where Euclidean distance reflects speaker identity. Using a jurisdiction-specific dataset of 3899 Australian English speech samples (FLAC format), proposed framework achieves 96.02% accuracy, 94.00% precision, and 92.10% recall in distinguishing same vs different speaker pairs. The proposed framework is evaluated against strong baselines including Convolutional Neural Networks (CNN), Bidirectional Long Short-Term Memory (BiLSTM), Gaussian Mixture Model-Universal Background Model (GMM-UBM), and validated via 5-fold cross-validation (mean ± std. dev.) to ensure statistical robustness. Proposed framework fills a critical gap in forensic phonetics by demonstrating that lightweight, interpretable, pairwise deep learning models can outperform complex generative or ensemble systems in real-world FVC scenarios. All preprocessing, training protocols, and hyperparameters are documented for reproducibility.