使用质量比方差离群因子的无参数砾岩最近邻分类器

Patcharasiri Fuangfoo, Krung Sinapiromsaran
{"title":"使用质量比方差离群因子的无参数砾岩最近邻分类器","authors":"Patcharasiri Fuangfoo, Krung Sinapiromsaran","doi":"10.18178/ijml.2023.13.4.1145","DOIUrl":null,"url":null,"abstract":"Classification is one important area in machine learning that labels the class of an instance via a classifier from known-class historical data. One of the popular classifiers is k-NN, which stands for “k-nearest neighbor” and requires a global parameter k to proceed. This global parameter may not be suitable for all instances. Naturally, each instance may situate on different regions of clusters such as an interior instance placed inside a cluster, a border instance placed on the outskirts, an outer instance placed faraway from any cluster, which requires a different number of neighbors. To automatically assign a different number of neighbors to each instance, the concept of scoring from the anomaly detection research is desired. The Mass-ratio-variance Outlier Factor, MOF, is selected as the scoring scheme for the number of neighbors of each instance. MOF gives the highest score to an instance placed very far from any cluster and the lowest score to an instance surrounded by other instances. This leads to the proposed classifier called the conglomerate nearest neighbor classifier, which does not require any parameter assigning the appropriate number of neighbors to each instance ordered by MOF. Experimental results show that this classifier exhibits similar accuracy to the k-nearest neighbor algorithm with the best k over the synthesized datasets. Six UCI datasets, the QSAR dataset, the German dataset, the Cancer dataset, the Wholesale dataset, the Haberman dataset, and the Glass3 dataset are used in the experiment. This method outperforms two UCI datasets, Wholesale and Glass3, and displays similar performance with respect to these six UCI datasets.","PeriodicalId":91709,"journal":{"name":"International journal of machine learning and computing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parameter-Free Conglomerate nearest Neighbor Classifier Using Mass-Ratio-Variance Outlier Factors\",\"authors\":\"Patcharasiri Fuangfoo, Krung Sinapiromsaran\",\"doi\":\"10.18178/ijml.2023.13.4.1145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is one important area in machine learning that labels the class of an instance via a classifier from known-class historical data. One of the popular classifiers is k-NN, which stands for “k-nearest neighbor” and requires a global parameter k to proceed. This global parameter may not be suitable for all instances. Naturally, each instance may situate on different regions of clusters such as an interior instance placed inside a cluster, a border instance placed on the outskirts, an outer instance placed faraway from any cluster, which requires a different number of neighbors. To automatically assign a different number of neighbors to each instance, the concept of scoring from the anomaly detection research is desired. The Mass-ratio-variance Outlier Factor, MOF, is selected as the scoring scheme for the number of neighbors of each instance. MOF gives the highest score to an instance placed very far from any cluster and the lowest score to an instance surrounded by other instances. This leads to the proposed classifier called the conglomerate nearest neighbor classifier, which does not require any parameter assigning the appropriate number of neighbors to each instance ordered by MOF. Experimental results show that this classifier exhibits similar accuracy to the k-nearest neighbor algorithm with the best k over the synthesized datasets. Six UCI datasets, the QSAR dataset, the German dataset, the Cancer dataset, the Wholesale dataset, the Haberman dataset, and the Glass3 dataset are used in the experiment. This method outperforms two UCI datasets, Wholesale and Glass3, and displays similar performance with respect to these six UCI datasets.\",\"PeriodicalId\":91709,\"journal\":{\"name\":\"International journal of machine learning and computing\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of machine learning and computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18178/ijml.2023.13.4.1145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of machine learning and computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18178/ijml.2023.13.4.1145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

分类是机器学习中的一个重要领域,它通过分类器从已知类别的历史数据中标记实例的类别。其中一个流行的分类器是k- nn,它代表“k近邻”,需要一个全局参数k来进行分类。此全局参数可能不适合所有实例。当然,每个实例可能位于集群的不同区域,例如内部实例放置在集群内,边界实例放置在外围,外部实例放置在远离任何集群的地方,这需要不同数量的邻居。为了给每个实例自动分配不同数量的邻居,需要异常检测研究中的评分概念。选取质量比方差离群因子(Mass-ratio-variance Outlier Factor, MOF)作为每个实例的邻居数的评分方案。MOF给离集群很远的实例最高分,给被其他实例包围的实例最低分。这导致了所提出的分类器称为组合最近邻分类器,它不需要任何参数为按MOF排序的每个实例分配适当数量的邻居。实验结果表明,该分类器在合成数据集上具有与k近邻算法相似的最佳k值。实验中使用了六个UCI数据集:QSAR数据集、德国数据集、Cancer数据集、Wholesale数据集、Haberman数据集和Glass3数据集。该方法优于两个UCI数据集Wholesale和Glass3,并且在这六个UCI数据集上显示出相似的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Parameter-Free Conglomerate nearest Neighbor Classifier Using Mass-Ratio-Variance Outlier Factors
Classification is one important area in machine learning that labels the class of an instance via a classifier from known-class historical data. One of the popular classifiers is k-NN, which stands for “k-nearest neighbor” and requires a global parameter k to proceed. This global parameter may not be suitable for all instances. Naturally, each instance may situate on different regions of clusters such as an interior instance placed inside a cluster, a border instance placed on the outskirts, an outer instance placed faraway from any cluster, which requires a different number of neighbors. To automatically assign a different number of neighbors to each instance, the concept of scoring from the anomaly detection research is desired. The Mass-ratio-variance Outlier Factor, MOF, is selected as the scoring scheme for the number of neighbors of each instance. MOF gives the highest score to an instance placed very far from any cluster and the lowest score to an instance surrounded by other instances. This leads to the proposed classifier called the conglomerate nearest neighbor classifier, which does not require any parameter assigning the appropriate number of neighbors to each instance ordered by MOF. Experimental results show that this classifier exhibits similar accuracy to the k-nearest neighbor algorithm with the best k over the synthesized datasets. Six UCI datasets, the QSAR dataset, the German dataset, the Cancer dataset, the Wholesale dataset, the Haberman dataset, and the Glass3 dataset are used in the experiment. This method outperforms two UCI datasets, Wholesale and Glass3, and displays similar performance with respect to these six UCI datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信