{"title":"基于层次邻域一致性的在线流特征选择","authors":"Kuangfeng Gong, Guohe Li, Lingyun Guo, Yaojin Lin","doi":"10.1002/cpe.70262","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In a data-driven world, datasets frequently exhibit multiple complexities, such as high dimensionality, dynamic features, and long-tail distributions. From the perspective of label space, samples may also possess hierarchical relationships. These characteristics not only increase the complexity of data processing and analysis but also pose challenges in developing efficient and accurate predictive models. To tackle these issues, an Online Streaming Feature Selection (OSFS) method utilizing hierarchical neighborhood consistency is proposed in this paper. This method can dynamically select significant features from the unknown streaming feature space of long-tailed distribution datasets. Specifically, the number of neighbors for each sample is determined based on the number of instances within its class. Positive and negative samples within the neighborhood are identified using a sibling strategy. Based on this novel hierarchical neighborhood relationship, we define hierarchical neighborhood consistency at three levels: Individual samples, layers within the hierarchy, and the entire tree structure. Furthermore, we establish three criteria for evaluating dynamic features: Online correlation selection, online importance analysis, and online redundancy update. A framework for selecting online streaming features is also designed. Extensive experiments demonstrate that the proposed algorithm enhances the prediction accuracy of tail classes across multiple long-tailed distribution datasets, outperforming comparison algorithms.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 23-24","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online Streaming Feature Selection Based on Hierarchical Neighborhood Consistency\",\"authors\":\"Kuangfeng Gong, Guohe Li, Lingyun Guo, Yaojin Lin\",\"doi\":\"10.1002/cpe.70262\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In a data-driven world, datasets frequently exhibit multiple complexities, such as high dimensionality, dynamic features, and long-tail distributions. From the perspective of label space, samples may also possess hierarchical relationships. These characteristics not only increase the complexity of data processing and analysis but also pose challenges in developing efficient and accurate predictive models. To tackle these issues, an Online Streaming Feature Selection (OSFS) method utilizing hierarchical neighborhood consistency is proposed in this paper. This method can dynamically select significant features from the unknown streaming feature space of long-tailed distribution datasets. Specifically, the number of neighbors for each sample is determined based on the number of instances within its class. Positive and negative samples within the neighborhood are identified using a sibling strategy. Based on this novel hierarchical neighborhood relationship, we define hierarchical neighborhood consistency at three levels: Individual samples, layers within the hierarchy, and the entire tree structure. Furthermore, we establish three criteria for evaluating dynamic features: Online correlation selection, online importance analysis, and online redundancy update. A framework for selecting online streaming features is also designed. Extensive experiments demonstrate that the proposed algorithm enhances the prediction accuracy of tail classes across multiple long-tailed distribution datasets, outperforming comparison algorithms.</p>\\n </div>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"37 23-24\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70262\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70262","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Online Streaming Feature Selection Based on Hierarchical Neighborhood Consistency
In a data-driven world, datasets frequently exhibit multiple complexities, such as high dimensionality, dynamic features, and long-tail distributions. From the perspective of label space, samples may also possess hierarchical relationships. These characteristics not only increase the complexity of data processing and analysis but also pose challenges in developing efficient and accurate predictive models. To tackle these issues, an Online Streaming Feature Selection (OSFS) method utilizing hierarchical neighborhood consistency is proposed in this paper. This method can dynamically select significant features from the unknown streaming feature space of long-tailed distribution datasets. Specifically, the number of neighbors for each sample is determined based on the number of instances within its class. Positive and negative samples within the neighborhood are identified using a sibling strategy. Based on this novel hierarchical neighborhood relationship, we define hierarchical neighborhood consistency at three levels: Individual samples, layers within the hierarchy, and the entire tree structure. Furthermore, we establish three criteria for evaluating dynamic features: Online correlation selection, online importance analysis, and online redundancy update. A framework for selecting online streaming features is also designed. Extensive experiments demonstrate that the proposed algorithm enhances the prediction accuracy of tail classes across multiple long-tailed distribution datasets, outperforming comparison algorithms.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.