Analysis of Machine Learning Classifiers for Speaker Identification: A Study on SVM, Random Forest, KNN, and Decision Tree

Journal Of Computer Networks, Architecture and High Performance Computing Pub Date : 2024-01-31 DOI:10.47709/cnahpc.v6i1.3487

Gregorius Airlangga

{"title":"Analysis of Machine Learning Classifiers for Speaker Identification: A Study on SVM, Random Forest, KNN, and Decision Tree","authors":"Gregorius Airlangga","doi":"10.47709/cnahpc.v6i1.3487","DOIUrl":null,"url":null,"abstract":"This study investigates the performance of machine learning classifiers in the domain of speaker identification, a pivotal component of modern digital security systems. With the burgeoning integration of voice-activated interfaces in technology, the demand for accurate and reliable speaker identification is paramount. This research provides a comprehensive comparison of four widely used classifiers: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Decision Tree (DT). Utilizing the LibriSpeech dataset, known for its diversity of speakers and recording conditions, we extracted Mel-frequency cepstral coefficients (MFCCs) to serve as features for training and evaluating the classifiers. Each model's performance was assessed based on precision, recall, F1-score, and accuracy. The results revealed that RF outperformed all other classifiers, achieving near-perfect metrics, indicative of its robustness and generalizability for speaker identification tasks. KNN also demonstrated high performance, suggesting its suitability for applications where rapid execution and interpretability are critical. Conversely, SVM and DT, while yielding moderate and lower performances respectively, highlighted the necessity for further optimization. These findings underscore the effectiveness of ensemble and distance-based classifiers in handling complex patterns for speaker differentiation. The study not only guides the selection of appropriate classifiers for speaker identification but also sets the stage for future research, which could explore hybrid models and the impact of dataset variability on performance. The insights from this analysis contribute significantly to the field, providing a benchmark for developing advanced speaker identification systems","PeriodicalId":15605,"journal":{"name":"Journal Of Computer Networks, Architecture and High Performance Computing","volume":"125 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal Of Computer Networks, Architecture and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47709/cnahpc.v6i1.3487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This study investigates the performance of machine learning classifiers in the domain of speaker identification, a pivotal component of modern digital security systems. With the burgeoning integration of voice-activated interfaces in technology, the demand for accurate and reliable speaker identification is paramount. This research provides a comprehensive comparison of four widely used classifiers: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), and Decision Tree (DT). Utilizing the LibriSpeech dataset, known for its diversity of speakers and recording conditions, we extracted Mel-frequency cepstral coefficients (MFCCs) to serve as features for training and evaluating the classifiers. Each model's performance was assessed based on precision, recall, F1-score, and accuracy. The results revealed that RF outperformed all other classifiers, achieving near-perfect metrics, indicative of its robustness and generalizability for speaker identification tasks. KNN also demonstrated high performance, suggesting its suitability for applications where rapid execution and interpretability are critical. Conversely, SVM and DT, while yielding moderate and lower performances respectively, highlighted the necessity for further optimization. These findings underscore the effectiveness of ensemble and distance-based classifiers in handling complex patterns for speaker differentiation. The study not only guides the selection of appropriate classifiers for speaker identification but also sets the stage for future research, which could explore hybrid models and the impact of dataset variability on performance. The insights from this analysis contribute significantly to the field, providing a benchmark for developing advanced speaker identification systems

查看原文本刊更多论文

用于识别说话人的机器学习分类器分析：关于 SVM、随机森林、KNN 和决策树的研究

本研究探讨了机器学习分类器在扬声器识别领域的性能，扬声器识别是现代数字安全系统的关键组成部分。随着声控界面在技术领域的蓬勃发展，对准确可靠的说话者识别技术的要求也越来越高。本研究对四种广泛使用的分类器进行了全面比较：支持向量机（SVM）、随机森林（RF）、K-近邻（KNN）和决策树（DT）。我们利用 LibriSpeech 数据集（该数据集因说话者和录音条件的多样性而闻名），提取了梅尔频率epstral系数（MFCC），作为训练和评估分类器的特征。我们根据精确度、召回率、F1 分数和准确度评估了每个模型的性能。结果表明，RF 的表现优于所有其他分类器，达到了接近完美的指标，这表明它对扬声器识别任务具有鲁棒性和通用性。KNN 也表现出很高的性能，这表明它适用于对快速执行和可解释性要求很高的应用。相反，SVM 和 DT 虽然分别取得了中等水平和较低水平的性能，但也凸显了进一步优化的必要性。这些发现强调了基于集合和距离的分类器在处理复杂模式以区分说话人方面的有效性。这项研究不仅为选择合适的分类器进行说话人识别提供了指导，还为未来的研究奠定了基础，未来的研究可能会探索混合模型以及数据集变化对性能的影响。这项分析的见解对该领域贡献巨大，为开发先进的扬声器识别系统提供了基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal Of Computer Networks, Architecture and High Performance Computing

自引率

0.00%

发文量