基于距离监督学习的特征子集加权

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-09-07 DOI:10.1016/j.patcog.2025.112424

Adnan Theerens , Yvan Saeys , Chris Cornelis

{"title":"基于距离监督学习的特征子集加权","authors":"Adnan Theerens , Yvan Saeys , Chris Cornelis","doi":"10.1016/j.patcog.2025.112424","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance function that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only <span><math><mi>m</mi></math></span> feature subset weights should be calculated each time instead of calculating all feature subset weights (<span><math><msup><mn>2</mn><mi>m</mi></msup></math></span>), where <span><math><mi>m</mi></math></span> is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a <span><math><mi>k</mi></math></span>-nearest neighbours (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112424"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature subset weighting for distance-based supervised learning\",\"authors\":\"Adnan Theerens , Yvan Saeys , Chris Cornelis\",\"doi\":\"10.1016/j.patcog.2025.112424\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance function that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only <span><math><mi>m</mi></math></span> feature subset weights should be calculated each time instead of calculating all feature subset weights (<span><math><msup><mn>2</mn><mi>m</mi></msup></math></span>), where <span><math><mi>m</mi></math></span> is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a <span><math><mi>k</mi></math></span>-nearest neighbours (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112424\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010854\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010854","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

本文引入了基于距离监督学习的单调度量特征子集加权。Choquet积分用于定义包含这些权重的距离函数。这种集成使所提出的距离能够有效地捕获非线性关系，并考虑条件属性和决策属性之间以及条件属性本身之间的相互作用，从而产生更灵活的距离度量。特别是，我们展示了这种方法如何确保距离不受添加重复和强相关特征的影响。这种方法的另一个关键点是使特征子集加权在计算上可行，因为每次只计算m个特征子集的权重，而不是计算所有特征子集的权重（2m），其中m为属性的个数。接下来，我们还研究了如何使用Choquet积分来测量相似性导致距离的非等效定义。通过双重测量进一步探讨了距离与相似性之间的关系。此外，还提出了对称Choquet距离和相似度，保持了相似度和距离之间的经典对称性。最后，我们引入了一个具体的特征子集加权距离，评估了其在k近邻（KNN）分类设置中的性能，并将其与马氏距离和加权距离方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feature subset weighting for distance-based supervised learning

This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance function that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only

m

feature subset weights should be calculated each time instead of calculating all feature subset weights (

2^{m}

), where

m

is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a

k

-nearest neighbours (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.