Efficient parameter selection for SVM: The case of business intelligence categorization

Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung
{"title":"Efficient parameter selection for SVM: The case of business intelligence categorization","authors":"Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung","doi":"10.1109/ISI.2017.8004897","DOIUrl":null,"url":null,"abstract":"Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2017.8004897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.
支持向量机的有效参数选择:以商业智能分类为例
支持向量机(SVM)是一种广泛应用于高维数据分类的技术,特别是在安全和智能分类中。然而,支持向量机的性能可能会受到选择不当的参数值的不利影响。目前的支持向量机参数选择方法主要依赖于广泛的交叉验证或轶事信息,这可能是低效和无效的。在这项研究中,我们提出了一种称为百分位支持向量机(P-SVM)的高效算法,用于选择度量数据上高斯核支持向量机的参数对(γ, C)。P-SVM只搜索数据点欧几里得距离平方的几个百分位数,以选择最佳的参数值对。为了验证算法,我们应用P-SVM对商业智能因素进行分类,这些因素是从信息技术领域四家主要公司的231篇在线新闻文章的6,859个句子中提取出来的。结果表明,与广泛使用的数据挖掘软件WEKA中使用的LibSVM包(具有默认参数值)相比,P-SVM在精度、召回率、F-measure和AUC方面都有显著提高。这些发现对相关研究和安全信息学应用具有重要意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信