Yunwei Zhang , Zongkai Shen , Fang Wang , Jinguo You , Xiaoxia Zhao
{"title":"A multidimensional feature grouping sampling algorithm based on dynamic feedback of prior bias","authors":"Yunwei Zhang , Zongkai Shen , Fang Wang , Jinguo You , Xiaoxia Zhao","doi":"10.1016/j.ins.2025.122490","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid development of information technology has led to the generation of massive amounts of large-scale discrete-variable data. However, processing the entire dataset will consume a lot of computing resources and be computationally inefficient. Sampling techniques provide a cost-effective solution to reduce the computational complexity while maintaining the original properties of the data. In pursuit of efficiency and effectiveness, this article proposes a multidimensional feature grouping sampling algorithm based on dynamic feedback of prior bias (MFGS) for sampling discrete-variable data. The basic idea is dynamic feedback iterative sampling. To this end, we established a dynamic feedback correction mechanism based on prior bias, which can accurately locate the sampling feature channel of each iteration, calculate the sampling size of each subgroup, and achieve accurate and targeted cyclic optimization sampling. Meanwhile, MFGS is introduced with the idea of smoothing filtering, which removes redundant samples in the oversampling area and can accurately limit the overall sample size. In addition, we use the multidimensional Manhattan distance to establish a sampling bias evaluation index, which provides a calculation basis for feedback and correction. Finally, we designed three experiments to verify the effectiveness of the feedback correction mechanism and smoothing filtering, and evaluate the sampling accuracy, computational efficiency, and sampling accuracy of the method under additional constraints. The experimental results show that the dynamic feedback correction mechanism and smoothing filter are effective, and MFGS outperforms the compared state-of-the-art methods in terms of sampling accuracy, and its computational efficiency is significantly improved compared with clustering-based sampling methods.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122490"},"PeriodicalIF":6.8000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S002002552500622X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid development of information technology has led to the generation of massive amounts of large-scale discrete-variable data. However, processing the entire dataset will consume a lot of computing resources and be computationally inefficient. Sampling techniques provide a cost-effective solution to reduce the computational complexity while maintaining the original properties of the data. In pursuit of efficiency and effectiveness, this article proposes a multidimensional feature grouping sampling algorithm based on dynamic feedback of prior bias (MFGS) for sampling discrete-variable data. The basic idea is dynamic feedback iterative sampling. To this end, we established a dynamic feedback correction mechanism based on prior bias, which can accurately locate the sampling feature channel of each iteration, calculate the sampling size of each subgroup, and achieve accurate and targeted cyclic optimization sampling. Meanwhile, MFGS is introduced with the idea of smoothing filtering, which removes redundant samples in the oversampling area and can accurately limit the overall sample size. In addition, we use the multidimensional Manhattan distance to establish a sampling bias evaluation index, which provides a calculation basis for feedback and correction. Finally, we designed three experiments to verify the effectiveness of the feedback correction mechanism and smoothing filtering, and evaluate the sampling accuracy, computational efficiency, and sampling accuracy of the method under additional constraints. The experimental results show that the dynamic feedback correction mechanism and smoothing filter are effective, and MFGS outperforms the compared state-of-the-art methods in terms of sampling accuracy, and its computational efficiency is significantly improved compared with clustering-based sampling methods.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.