Semi-automatic approach utilizing Siamese Neural Network for forensic voice comparison

Franklin Open Pub Date : 2026-03-01 Epub Date: 2026-02-07 DOI:10.1016/j.fraope.2026.100527
S.G. Kruthika , Trisiladevi C. Nagavi , P. Mahesha , H.T. Chethana , Vinayakumar Ravi , Alanoud Al Mazroa
{"title":"Semi-automatic approach utilizing Siamese Neural Network for forensic voice comparison","authors":"S.G. Kruthika ,&nbsp;Trisiladevi C. Nagavi ,&nbsp;P. Mahesha ,&nbsp;H.T. Chethana ,&nbsp;Vinayakumar Ravi ,&nbsp;Alanoud Al Mazroa","doi":"10.1016/j.fraope.2026.100527","DOIUrl":null,"url":null,"abstract":"<div><div>Forensic Voice Comparison (FVC) remains a critical yet challenging task in digital forensics, often hindered by manual subjectivity, background noise, and speaker variability. This paper presents a novel semi-automatic FVC framework based on Siamese Neural Networks (SNN), a discriminative metric-learning architecture combined with stationary noise reduction for robust voice similarity assessment. Proposed framework leverages the SNN’s ability to learn a shared embedding space where Euclidean distance reflects speaker identity. Using a jurisdiction-specific dataset of 3899 Australian English speech samples (FLAC format), proposed framework achieves 96.02% accuracy, 94.00% precision, and 92.10% recall in distinguishing same vs different speaker pairs. The proposed framework is evaluated against strong baselines including Convolutional Neural Networks (CNN), Bidirectional Long Short-Term Memory (BiLSTM), Gaussian Mixture Model-Universal Background Model (GMM-UBM), and validated via 5-fold cross-validation (mean ± std. dev.) to ensure statistical robustness. Proposed framework fills a critical gap in forensic phonetics by demonstrating that lightweight, interpretable, pairwise deep learning models can outperform complex generative or ensemble systems in real-world FVC scenarios. All preprocessing, training protocols, and hyperparameters are documented for reproducibility.</div></div>","PeriodicalId":100554,"journal":{"name":"Franklin Open","volume":"14 ","pages":"Article 100527"},"PeriodicalIF":0.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Franklin Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773186326000435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/7 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Forensic Voice Comparison (FVC) remains a critical yet challenging task in digital forensics, often hindered by manual subjectivity, background noise, and speaker variability. This paper presents a novel semi-automatic FVC framework based on Siamese Neural Networks (SNN), a discriminative metric-learning architecture combined with stationary noise reduction for robust voice similarity assessment. Proposed framework leverages the SNN’s ability to learn a shared embedding space where Euclidean distance reflects speaker identity. Using a jurisdiction-specific dataset of 3899 Australian English speech samples (FLAC format), proposed framework achieves 96.02% accuracy, 94.00% precision, and 92.10% recall in distinguishing same vs different speaker pairs. The proposed framework is evaluated against strong baselines including Convolutional Neural Networks (CNN), Bidirectional Long Short-Term Memory (BiLSTM), Gaussian Mixture Model-Universal Background Model (GMM-UBM), and validated via 5-fold cross-validation (mean ± std. dev.) to ensure statistical robustness. Proposed framework fills a critical gap in forensic phonetics by demonstrating that lightweight, interpretable, pairwise deep learning models can outperform complex generative or ensemble systems in real-world FVC scenarios. All preprocessing, training protocols, and hyperparameters are documented for reproducibility.
利用暹罗神经网络进行法医语音比对的半自动方法
法医语音比较(FVC)在数字取证中仍然是一项关键但具有挑战性的任务,经常受到人工主观性、背景噪声和说话者变化的阻碍。本文提出了一种基于Siamese神经网络(SNN)的半自动FVC框架,该框架是一种结合平稳降噪的判别度量学习架构,用于鲁棒语音相似度评估。该框架利用SNN学习共享嵌入空间的能力,其中欧几里得距离反映说话人身份。使用3899个澳大利亚英语语音样本(FLAC格式)的管辖区特定数据集,该框架在区分相同和不同的说话人对方面达到96.02%的准确率,94.00%的精度和92.10%的召回率。该框架通过卷积神经网络(CNN)、双向长短期记忆(BiLSTM)、高斯混合模型-通用背景模型(GMM-UBM)等强大基线进行评估,并通过5倍交叉验证(mean±std. dev.)进行验证,以确保统计稳健性。该框架通过证明轻量级、可解释、两两深度学习模型在现实世界的FVC场景中可以胜过复杂的生成或集成系统,填补了法医语音学的关键空白。所有预处理、训练协议和超参数都记录为可重复性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书