FWLMkNN：基于聚类和功能数据分析的高效泛函k近邻算法

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-06-11 DOI:10.1016/j.eswa.2025.128567

Mohammed Sabri, Rosanna Verde, Antonio Balzanella

{"title":"FWLMkNN：基于聚类和功能数据分析的高效泛函k近邻算法","authors":"Mohammed Sabri, Rosanna Verde, Antonio Balzanella","doi":"10.1016/j.eswa.2025.128567","DOIUrl":null,"url":null,"abstract":"<div><div>The increase in data characterized by continuous time and space-varying sequences of observations, such as curves, surfaces, and trajectories, has established the fundamental role of functional data analysis (FDA) in modern statistical methodology. This paper introduces an innovative classification framework that enhances the accuracy of functional data classifiers. This approach merges the strengths of functional supervised and unsupervised learning techniques. It introduces a unique objective function for the unsupervised learning stage to discover novel patterns that are critical for the successful classification of functional data. The process begins with a clustering phase as a preprocessing step that sets the groundwork for the subsequent classification process, which is guided by the clustering results. A partition of the original classes of the training set into distinct subgroups is provided by optimizing a new objective function. This process is achieved by decreasing the variability within each subgroup of a given class while improving the separation between these subgroups and those of other classes. The algorithm automatically determines representative subgroups and the weights assigned to the variables. The weight optimization technique identifies the most discriminative variables for clustering by dynamically adjusting weights to minimize the influence of noise-inducing features in the classification process. Hence this strategy allows for a more efficient and robust classification. Our proposal employs a weighted local mean k-nearest neighbor (KNN) approach within the classification phase. The proposed methodology leverages the novel augmented label space derived from the initial clustering phase, enhancing the classification process. Specifically, the method entails identifying the <span><math><mi>k</mi></math></span> nearest neighbors within each subgroup, computing <span><math><mi>k</mi></math></span> distinct local mean vectors, and subsequently utilizing these vectors to determine their weighted distance relative to the query sample. Consequently, the classification of the query sample is achieved by allocating it to the category exhibiting the minimum distance. The proposed methodology was evaluated using both synthetic datasets and established real-world datasets. Experimental results demonstrate significant reductions in classification error rate compared to state-of-the-art methods, highlighting the framework’s robustness across diverse data. Furthermore, we validate our approach through a practical case study on seasonal classification of Italian electricity load curves, demonstrating its effectiveness in real-world energy management applications.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128567"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FWLMkNN: Efficient functional K-nearest neighbor based on clustering and functional data analysis\",\"authors\":\"Mohammed Sabri, Rosanna Verde, Antonio Balzanella\",\"doi\":\"10.1016/j.eswa.2025.128567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The increase in data characterized by continuous time and space-varying sequences of observations, such as curves, surfaces, and trajectories, has established the fundamental role of functional data analysis (FDA) in modern statistical methodology. This paper introduces an innovative classification framework that enhances the accuracy of functional data classifiers. This approach merges the strengths of functional supervised and unsupervised learning techniques. It introduces a unique objective function for the unsupervised learning stage to discover novel patterns that are critical for the successful classification of functional data. The process begins with a clustering phase as a preprocessing step that sets the groundwork for the subsequent classification process, which is guided by the clustering results. A partition of the original classes of the training set into distinct subgroups is provided by optimizing a new objective function. This process is achieved by decreasing the variability within each subgroup of a given class while improving the separation between these subgroups and those of other classes. The algorithm automatically determines representative subgroups and the weights assigned to the variables. The weight optimization technique identifies the most discriminative variables for clustering by dynamically adjusting weights to minimize the influence of noise-inducing features in the classification process. Hence this strategy allows for a more efficient and robust classification. Our proposal employs a weighted local mean k-nearest neighbor (KNN) approach within the classification phase. The proposed methodology leverages the novel augmented label space derived from the initial clustering phase, enhancing the classification process. Specifically, the method entails identifying the <span><math><mi>k</mi></math></span> nearest neighbors within each subgroup, computing <span><math><mi>k</mi></math></span> distinct local mean vectors, and subsequently utilizing these vectors to determine their weighted distance relative to the query sample. Consequently, the classification of the query sample is achieved by allocating it to the category exhibiting the minimum distance. The proposed methodology was evaluated using both synthetic datasets and established real-world datasets. Experimental results demonstrate significant reductions in classification error rate compared to state-of-the-art methods, highlighting the framework’s robustness across diverse data. Furthermore, we validate our approach through a practical case study on seasonal classification of Italian electricity load curves, demonstrating its effectiveness in real-world energy management applications.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"292 \",\"pages\":\"Article 128567\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425021864\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425021864","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

以连续时间和空间变化的观测序列（如曲线、曲面和轨迹）为特征的数据的增加，确立了功能数据分析（FDA）在现代统计方法中的基本作用。本文提出了一种创新的分类框架，提高了功能数据分类器的准确率。这种方法融合了功能监督和非监督学习技术的优势。它为无监督学习阶段引入了一个独特的目标函数，以发现对功能数据的成功分类至关重要的新模式。该过程从聚类阶段开始，作为预处理步骤，为后续分类过程奠定基础，后续分类过程由聚类结果指导。通过优化一个新的目标函数，将训练集的原始类划分为不同的子组。这个过程是通过减少给定类的每个子组内的可变性，同时改善这些子组与其他类的子组之间的分离来实现的。该算法自动确定具有代表性的子组和分配给变量的权重。权值优化技术通过动态调整权值来识别最具判别性的聚类变量，以最大限度地减少分类过程中噪声特征的影响。因此，该策略允许更有效和健壮的分类。我们的建议在分类阶段采用加权局部平均k近邻（KNN）方法。提出的方法利用从初始聚类阶段衍生的新型增强标签空间，增强分类过程。具体来说，该方法需要识别每个子组中k个最近的邻居，计算k个不同的局部平均向量，然后利用这些向量确定它们相对于查询样本的加权距离。因此，查询样本的分类是通过将其分配到显示最小距离的类别来实现的。使用合成数据集和已建立的真实世界数据集对所提出的方法进行了评估。实验结果表明，与最先进的方法相比，分类错误率显著降低，突出了该框架在不同数据中的鲁棒性。此外，我们通过对意大利电力负荷曲线季节性分类的实际案例研究验证了我们的方法，展示了其在现实世界能源管理应用中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FWLMkNN: Efficient functional K-nearest neighbor based on clustering and functional data analysis

The increase in data characterized by continuous time and space-varying sequences of observations, such as curves, surfaces, and trajectories, has established the fundamental role of functional data analysis (FDA) in modern statistical methodology. This paper introduces an innovative classification framework that enhances the accuracy of functional data classifiers. This approach merges the strengths of functional supervised and unsupervised learning techniques. It introduces a unique objective function for the unsupervised learning stage to discover novel patterns that are critical for the successful classification of functional data. The process begins with a clustering phase as a preprocessing step that sets the groundwork for the subsequent classification process, which is guided by the clustering results. A partition of the original classes of the training set into distinct subgroups is provided by optimizing a new objective function. This process is achieved by decreasing the variability within each subgroup of a given class while improving the separation between these subgroups and those of other classes. The algorithm automatically determines representative subgroups and the weights assigned to the variables. The weight optimization technique identifies the most discriminative variables for clustering by dynamically adjusting weights to minimize the influence of noise-inducing features in the classification process. Hence this strategy allows for a more efficient and robust classification. Our proposal employs a weighted local mean k-nearest neighbor (KNN) approach within the classification phase. The proposed methodology leverages the novel augmented label space derived from the initial clustering phase, enhancing the classification process. Specifically, the method entails identifying the

k

nearest neighbors within each subgroup, computing

k

distinct local mean vectors, and subsequently utilizing these vectors to determine their weighted distance relative to the query sample. Consequently, the classification of the query sample is achieved by allocating it to the category exhibiting the minimum distance. The proposed methodology was evaluated using both synthetic datasets and established real-world datasets. Experimental results demonstrate significant reductions in classification error rate compared to state-of-the-art methods, highlighting the framework’s robustness across diverse data. Furthermore, we validate our approach through a practical case study on seasonal classification of Italian electricity load curves, demonstrating its effectiveness in real-world energy management applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.