A Novel Density-Based Approach for Instance Selection

2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2016-11-01 DOI:10.1109/ICTAI.2016.0090

J. Carbonera, Mara Abel

{"title":"A Novel Density-Based Approach for Instance Selection","authors":"J. Carbonera, Mara Abel","doi":"10.1109/ICTAI.2016.0090","DOIUrl":null,"url":null,"abstract":"Due to the increasing of the size of the datasets, techniques for instance selection have been applied for reducing the data to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Besides that, algorithms of instance selection can also be applied for removing useless, erroneous or noisy instances, before applying learning algorithms. In the last years, several approaches for instance selection have been proposed. However, most of them have high time complexity and, due to this, they cannot be used for dealing with large datasets. In this paper, we present an algorithm called CDIS that can be viewed as an improvement of a recently proposed density-based approach for instance selection. The main contribution of this paper is a formal characterization of a novel density function that is adopted by the CDIS algorithm. The CDIS algorithm evaluates the instances of each class separately and keeps only the densest instances in a given (arbitrary) neighborhood. This ensures a reasonably low time complexity. Our approach was evaluated on 20 well-known data sets and its performance was compared with the performance of 6 state-of-the-art algorithms, considering three measures: accuracy, reduction and effectiveness. For evaluating the accuracy achieved using the datasets produced by the algorithms, we applied the KNN algorithm. The results show that our approach achieves a performance (in terms of balance of accuracy and reduction) that is better or comparable to the performances of the other algorithms considered in the evaluation.","PeriodicalId":245697,"journal":{"name":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"170 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2016.0090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Due to the increasing of the size of the datasets, techniques for instance selection have been applied for reducing the data to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Besides that, algorithms of instance selection can also be applied for removing useless, erroneous or noisy instances, before applying learning algorithms. In the last years, several approaches for instance selection have been proposed. However, most of them have high time complexity and, due to this, they cannot be used for dealing with large datasets. In this paper, we present an algorithm called CDIS that can be viewed as an improvement of a recently proposed density-based approach for instance selection. The main contribution of this paper is a formal characterization of a novel density function that is adopted by the CDIS algorithm. The CDIS algorithm evaluates the instances of each class separately and keeps only the densest instances in a given (arbitrary) neighborhood. This ensures a reasonably low time complexity. Our approach was evaluated on 20 well-known data sets and its performance was compared with the performance of 6 state-of-the-art algorithms, considering three measures: accuracy, reduction and effectiveness. For evaluating the accuracy achieved using the datasets produced by the algorithms, we applied the KNN algorithm. The results show that our approach achieves a performance (in terms of balance of accuracy and reduction) that is better or comparable to the performances of the other algorithms considered in the evaluation.

查看原文本刊更多论文

一种新的基于密度的实例选择方法

由于数据集规模的增加，已经应用实例选择等技术将数据减少到可管理的数量，从而减少了执行学习过程所需的计算资源。此外，在应用学习算法之前，还可以使用实例选择算法去除无用的、错误的或有噪声的实例。在过去的几年中，已经提出了几种实例选择的方法。然而，它们大多具有较高的时间复杂度，因此不能用于处理大型数据集。在本文中，我们提出了一种称为CDIS的算法，该算法可以被视为对最近提出的基于密度的实例选择方法的改进。本文的主要贡献是对CDIS算法采用的一种新型密度函数的形式化表征。CDIS算法分别评估每个类的实例，并只在给定(任意)邻域中保留最密集的实例。这确保了较低的时间复杂度。我们的方法在20个知名数据集上进行了评估，并将其性能与6种最先进的算法的性能进行了比较，考虑了三个指标:准确性、简化和有效性。为了评估使用算法产生的数据集所达到的精度，我们应用了KNN算法。结果表明，我们的方法达到了比评估中考虑的其他算法更好或相当的性能(在准确性和约简的平衡方面)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量