多组分类中混合数据建模的二值化策略

Youssef Masmoudi, M. Turkay, H. Chabchoub
{"title":"多组分类中混合数据建模的二值化策略","authors":"Youssef Masmoudi, M. Turkay, H. Chabchoub","doi":"10.1109/ICADLT.2013.6568483","DOIUrl":null,"url":null,"abstract":"This paper presents a binarization pre-processing strategy for mixed datasets. We propose that the use of binary attributes for representing nominal and integer data is beneficial for classification accuracy. We also describe a procedure to convert integer and nominal data into binary attributes. Expectation- Maximization (EM) clustering algorithms was applied to classify the values of the attributes with a wide range to use a small number of binary attributes. Once the data set is pre-processed, we use the Support Vector Machine (LibSVM) for classification. The proposed method was tested on datasets from the literature. We demonstrate the improved accuracy and efficiency of presented binarization strategy for modelling mixed and complex data in comparison to the classification of the original dataset, nominal dataset and binary dataset.","PeriodicalId":269509,"journal":{"name":"2013 International Conference on Advanced Logistics and Transport","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A binarization strategy for modelling mixed data in multigroup classification\",\"authors\":\"Youssef Masmoudi, M. Turkay, H. Chabchoub\",\"doi\":\"10.1109/ICADLT.2013.6568483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a binarization pre-processing strategy for mixed datasets. We propose that the use of binary attributes for representing nominal and integer data is beneficial for classification accuracy. We also describe a procedure to convert integer and nominal data into binary attributes. Expectation- Maximization (EM) clustering algorithms was applied to classify the values of the attributes with a wide range to use a small number of binary attributes. Once the data set is pre-processed, we use the Support Vector Machine (LibSVM) for classification. The proposed method was tested on datasets from the literature. We demonstrate the improved accuracy and efficiency of presented binarization strategy for modelling mixed and complex data in comparison to the classification of the original dataset, nominal dataset and binary dataset.\",\"PeriodicalId\":269509,\"journal\":{\"name\":\"2013 International Conference on Advanced Logistics and Transport\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Advanced Logistics and Transport\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICADLT.2013.6568483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Advanced Logistics and Transport","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICADLT.2013.6568483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

提出了一种混合数据集的二值化预处理策略。我们提出使用二进制属性来表示标称和整数数据有利于分类精度。我们还描述了将整数和标称数据转换为二进制属性的过程。采用期望最大化(EM)聚类算法对范围较大的属性值进行分类,以使用较少的二值属性。一旦数据集被预处理,我们使用支持向量机(LibSVM)进行分类。该方法在文献数据集上进行了测试。与原始数据集、标称数据集和二进制数据集的分类相比,我们证明了所提出的二值化策略在混合和复杂数据建模方面的准确性和效率的提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A binarization strategy for modelling mixed data in multigroup classification
This paper presents a binarization pre-processing strategy for mixed datasets. We propose that the use of binary attributes for representing nominal and integer data is beneficial for classification accuracy. We also describe a procedure to convert integer and nominal data into binary attributes. Expectation- Maximization (EM) clustering algorithms was applied to classify the values of the attributes with a wide range to use a small number of binary attributes. Once the data set is pre-processed, we use the Support Vector Machine (LibSVM) for classification. The proposed method was tested on datasets from the literature. We demonstrate the improved accuracy and efficiency of presented binarization strategy for modelling mixed and complex data in comparison to the classification of the original dataset, nominal dataset and binary dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信