Hyper‐Parameter Optimization of Kernel Functions on Multi‐Class Text Categorization: A Comparative Evaluation

WIREs Data Mining and Knowledge Discovery Pub Date : 2024-11-28 DOI:10.1002/widm.1572

Michael Loki, Agnes Mindila, Wilson Cheruiyot

{"title":"Hyper‐Parameter Optimization of Kernel Functions on Multi‐Class Text Categorization: A Comparative Evaluation","authors":"Michael Loki, Agnes Mindila, Wilson Cheruiyot","doi":"10.1002/widm.1572","DOIUrl":null,"url":null,"abstract":"In recent years, machine learning (ML) has witnessed a paradigm shift in kernel function selection, which is pivotal in optimizing various ML models. Despite multiple studies about its significance, a comprehensive understanding of kernel function selection, particularly about model performance, still needs to be explored. Challenges remain in selecting and optimizing kernel functions to improve model performance and efficiency. The study investigates how gamma parameter and cost parameter influence performance metrics in multi‐class classification tasks using various kernel‐based algorithms. Through sensitivity analysis, the impact of these parameters on classification performance and computational efficiency is assessed. The experimental setup involves deploying ML models using four kernel‐based algorithms: Support Vector Machine, Radial Basis Function, Polynomial Kernel, and Sigmoid Kernel. Data preparation includes text processing, categorization, and feature extraction using TfidfVectorizer, followed by model training and validation. Results indicate that Support Vector Machine with default settings and Radial Basis Function kernel consistently outperforms polynomial and sigmoid kernels. Adjusting gamma improves model accuracy and precision, highlighting its role in capturing complex relationships. Regularization cost parameters, however, show minimal impact on performance. The study also reveals that configurations with moderate gamma values achieve better balance between performance and computational time compared to higher gamma values or no gamma adjustment. The findings underscore the delicate balance between model performance and computational efficiency by highlighting the trade‐offs between model complexity and efficiency.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"84 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WIREs Data Mining and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/widm.1572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, machine learning (ML) has witnessed a paradigm shift in kernel function selection, which is pivotal in optimizing various ML models. Despite multiple studies about its significance, a comprehensive understanding of kernel function selection, particularly about model performance, still needs to be explored. Challenges remain in selecting and optimizing kernel functions to improve model performance and efficiency. The study investigates how gamma parameter and cost parameter influence performance metrics in multi‐class classification tasks using various kernel‐based algorithms. Through sensitivity analysis, the impact of these parameters on classification performance and computational efficiency is assessed. The experimental setup involves deploying ML models using four kernel‐based algorithms: Support Vector Machine, Radial Basis Function, Polynomial Kernel, and Sigmoid Kernel. Data preparation includes text processing, categorization, and feature extraction using TfidfVectorizer, followed by model training and validation. Results indicate that Support Vector Machine with default settings and Radial Basis Function kernel consistently outperforms polynomial and sigmoid kernels. Adjusting gamma improves model accuracy and precision, highlighting its role in capturing complex relationships. Regularization cost parameters, however, show minimal impact on performance. The study also reveals that configurations with moderate gamma values achieve better balance between performance and computational time compared to higher gamma values or no gamma adjustment. The findings underscore the delicate balance between model performance and computational efficiency by highlighting the trade‐offs between model complexity and efficiency.

查看原文本刊更多论文

多类文本分类核函数的超参数优化：一个比较评价

近年来，机器学习在核函数选择方面发生了范式转变，核函数选择是优化各种机器学习模型的关键。尽管有很多关于其重要性的研究，但对核函数选择的全面理解，特别是对模型性能的理解，仍然需要探索。在选择和优化核函数以提高模型性能和效率方面仍然存在挑战。该研究探讨了伽马参数和成本参数如何影响多类分类任务中使用各种基于核的算法的性能指标。通过灵敏度分析，评估这些参数对分类性能和计算效率的影响。实验设置包括使用四种基于核的算法部署ML模型：支持向量机、径向基函数、多项式核和Sigmoid核。数据准备包括使用TfidfVectorizer进行文本处理、分类和特征提取，然后进行模型训练和验证。结果表明，具有默认设置和径向基函数核的支持向量机始终优于多项式核和sigmoid核。调整gamma提高了模型的准确性和精度，突出了它在捕获复杂关系中的作用。然而，正则化成本参数对性能的影响很小。研究还表明，与gamma值较高或不进行gamma调整相比，gamma值适中的配置在性能和计算时间之间实现了更好的平衡。研究结果通过强调模型复杂性和效率之间的权衡，强调了模型性能和计算效率之间的微妙平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

WIREs Data Mining and Knowledge Discovery

自引率

0.00%

发文量