INCORPORATING DENSITY IN K-NEAREST NEIGHBORS REGRESSION

International Journal of Advanced Research in Computer Science Pub Date : 2023-06-20 DOI:10.26483/ijarcs.v14i3.6989

M. Mahfouz

{"title":"INCORPORATING DENSITY IN K-NEAREST NEIGHBORS REGRESSION","authors":"M. Mahfouz","doi":"10.26483/ijarcs.v14i3.6989","DOIUrl":null,"url":null,"abstract":"The application of the traditional k-nearest neighbours in regression analysis suffers from several difficulties when only a limited number of samples are available. In this paper, two decision models based on density are proposed. In order to reduce testing time, a k-nearest neighbours table (kNN-Table) is maintained to keep the neighbours of each object x along with their weighted Manhattan distance to x and a binary vector representing the increase or the decrease in each dimension compared to x’s values. In the first decision model, if the unseen sample having a distance to one of its neighbours x less than the farthest neighbour of x’s neighbour then its label is estimated using linear interpolation otherwise linear extrapolation is used. In the second decision model, for each neighbour x of the unseen sample, the distance of the unseen sample to x and the binary vector are computed. Also, the set S of nearest neighbours of x are identified from the kNN-Table. For each sample in S, a normalized distance to the unseen sample is computed using the information stored in the kNN-Table and it is used to compute the weight of each neighbor of the neighbors of the unseen object. In the two models, a weighted average of the computed label for each neighbour is assigned to the unseen object. The diversity between the two proposed decision models and the traditional kNN regressor motivates us to develop an ensemble of the two proposed models along with traditional kNN regressor. The ensemble is evaluated and the results showed that the ensemble achieves significant increase in the performance compared to its base regressors and several related algorithms.","PeriodicalId":287911,"journal":{"name":"International Journal of Advanced Research in Computer Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Research in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26483/ijarcs.v14i3.6989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The application of the traditional k-nearest neighbours in regression analysis suffers from several difficulties when only a limited number of samples are available. In this paper, two decision models based on density are proposed. In order to reduce testing time, a k-nearest neighbours table (kNN-Table) is maintained to keep the neighbours of each object x along with their weighted Manhattan distance to x and a binary vector representing the increase or the decrease in each dimension compared to x’s values. In the first decision model, if the unseen sample having a distance to one of its neighbours x less than the farthest neighbour of x’s neighbour then its label is estimated using linear interpolation otherwise linear extrapolation is used. In the second decision model, for each neighbour x of the unseen sample, the distance of the unseen sample to x and the binary vector are computed. Also, the set S of nearest neighbours of x are identified from the kNN-Table. For each sample in S, a normalized distance to the unseen sample is computed using the information stored in the kNN-Table and it is used to compute the weight of each neighbor of the neighbors of the unseen object. In the two models, a weighted average of the computed label for each neighbour is assigned to the unseen object. The diversity between the two proposed decision models and the traditional kNN regressor motivates us to develop an ensemble of the two proposed models along with traditional kNN regressor. The ensemble is evaluated and the results showed that the ensemble achieves significant increase in the performance compared to its base regressors and several related algorithms.

查看原文本刊更多论文

在k近邻回归中结合密度

当样本数量有限时，传统的k近邻在回归分析中的应用会遇到一些困难。本文提出了两种基于密度的决策模型。为了减少测试时间，维护一个k近邻表(kNN-Table)来保留每个对象x的邻居及其到x的加权曼哈顿距离，以及一个表示与x的值相比每个维度的增加或减少的二进制向量。在第一个决策模型中，如果看不见的样本与其邻居之一x的距离小于x邻居的最远邻居，则使用线性插值估计其标签，否则使用线性外推。在第二个决策模型中，对于看不见的样本的每个邻居x，计算看不见的样本到x的距离和二进制向量。同时，从knn表中识别出x的近邻集S。对于S中的每个样本，使用存储在kNN-Table中的信息计算到未见样本的归一化距离，并用于计算未见对象的邻居的每个邻居的权重。在这两种模型中，计算出的每个邻居标签的加权平均值被分配给看不见的物体。两种提出的决策模型和传统的kNN回归量之间的差异促使我们将两种提出的模型与传统的kNN回归量结合起来。对集成进行了评估，结果表明，与基本回归量和几种相关算法相比，集成的性能有了显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Advanced Research in Computer Science

自引率

0.00%

发文量