基于证据参数权重的有监督多类问题标称特征的数值化

Eftim Zdravevski, Petre Lameski, A. Kulakov, S. Kalajdziski
{"title":"基于证据参数权重的有监督多类问题标称特征的数值化","authors":"Eftim Zdravevski, Petre Lameski, A. Kulakov, S. Kalajdziski","doi":"10.15439/2015F90","DOIUrl":null,"url":null,"abstract":"Machine learning has received increased interest by both the scientific community and the industry. Most of the machine learning algorithms rely on certain distance metrics that can only be applied to numeric data. This becomes a problem in complex datasets that contain heterogeneous data consisted of numeric and nominal (i.e. categorical) features. Thus the need of transformation from nominal to numeric data. Weight of evidence (WoE) is one of the parameters that can be used for transformation of the nominal features to numeric. In this paper we describe a method that uses WoE to transform the features. Although the applicability of this method is researched to some extent, in this paper we extend its applicability for multi-class problems, which is a novelty. We compared it with the method that generates dummy features. We test both methods on binary and multi-class classification problems with different machine learning algorithms. Our experiments show that the WoE based transformation generates smaller number of features compared to the technique based on generation of dummy features while also improving the classification accuracy, reducing memory complexity and shortening the execution time. Be that as it may, we also point out some of its weaknesses and make some recommendations when to use the method based on dummy features generation instead.","PeriodicalId":276884,"journal":{"name":"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter\",\"authors\":\"Eftim Zdravevski, Petre Lameski, A. Kulakov, S. Kalajdziski\",\"doi\":\"10.15439/2015F90\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning has received increased interest by both the scientific community and the industry. Most of the machine learning algorithms rely on certain distance metrics that can only be applied to numeric data. This becomes a problem in complex datasets that contain heterogeneous data consisted of numeric and nominal (i.e. categorical) features. Thus the need of transformation from nominal to numeric data. Weight of evidence (WoE) is one of the parameters that can be used for transformation of the nominal features to numeric. In this paper we describe a method that uses WoE to transform the features. Although the applicability of this method is researched to some extent, in this paper we extend its applicability for multi-class problems, which is a novelty. We compared it with the method that generates dummy features. We test both methods on binary and multi-class classification problems with different machine learning algorithms. Our experiments show that the WoE based transformation generates smaller number of features compared to the technique based on generation of dummy features while also improving the classification accuracy, reducing memory complexity and shortening the execution time. Be that as it may, we also point out some of its weaknesses and make some recommendations when to use the method based on dummy features generation instead.\",\"PeriodicalId\":276884,\"journal\":{\"name\":\"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15439/2015F90\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Federated Conference on Computer Science and Information Systems (FedCSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15439/2015F90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

摘要

机器学习已经引起了科学界和工业界越来越多的兴趣。大多数机器学习算法依赖于某些只能应用于数字数据的距离度量。这在包含由数字和名义(即分类)特征组成的异构数据的复杂数据集中成为一个问题。因此,需要将标称数据转换为数值数据。证据权(WoE)是将标称特征转换为数字特征的参数之一。本文描述了一种利用WoE对特征进行变换的方法。虽然对该方法的适用性进行了一定程度的研究,但本文将其扩展到多类问题,这是一种新颖的方法。我们将其与生成虚拟特征的方法进行了比较。我们用不同的机器学习算法对这两种方法在二分类和多分类问题上进行了测试。我们的实验表明,与基于虚拟特征生成的技术相比,基于WoE的转换生成的特征数量更少,同时提高了分类精度,降低了记忆复杂度,缩短了执行时间。尽管如此,我们也指出了它的一些弱点,并在使用基于虚拟特征生成的方法时提出了一些建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter
Machine learning has received increased interest by both the scientific community and the industry. Most of the machine learning algorithms rely on certain distance metrics that can only be applied to numeric data. This becomes a problem in complex datasets that contain heterogeneous data consisted of numeric and nominal (i.e. categorical) features. Thus the need of transformation from nominal to numeric data. Weight of evidence (WoE) is one of the parameters that can be used for transformation of the nominal features to numeric. In this paper we describe a method that uses WoE to transform the features. Although the applicability of this method is researched to some extent, in this paper we extend its applicability for multi-class problems, which is a novelty. We compared it with the method that generates dummy features. We test both methods on binary and multi-class classification problems with different machine learning algorithms. Our experiments show that the WoE based transformation generates smaller number of features compared to the technique based on generation of dummy features while also improving the classification accuracy, reducing memory complexity and shortening the execution time. Be that as it may, we also point out some of its weaknesses and make some recommendations when to use the method based on dummy features generation instead.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信