Enhancing big data analysis in IoT applications and optimizing the performance of machine learning models using hybrid dimensionality optimization approach
IF 7.6 3区 计算机科学Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
{"title":"Enhancing big data analysis in IoT applications and optimizing the performance of machine learning models using hybrid dimensionality optimization approach","authors":"Ihab Nassra, Juan V. Capella","doi":"10.1016/j.iot.2025.101764","DOIUrl":null,"url":null,"abstract":"<div><div>The proliferation of Internet of Things (IoT) applications generates high-dimensional datasets characterized by substantial velocity, variety, and complexity, imposing severe computational constraints on machine learning systems. Such data's high dimensionality complicates identifying meaningful correlations among features. Thus, high-dimensional datasets pose substantial challenges for machine learning, as the abundance of variables tends to obscure meaningful correlations and hinder practical data analysis, particularly regarding computational resource consumption (e.g., memory usage), processing time, and machine learning models' training efficiency and performance. Dimensionality reduction techniques address these challenges by decreasing the number of input variables and preserving the intrinsic structure of the data while alleviating computational burdens. Nevertheless, most contemporary methods are optimized for either linear or nonlinear data patterns, but rarely both. Hybrid strategies integrating linear and nonlinear reduction techniques have increasingly addressed these constraints. Specifically, the combination of Principal Component Analysis (PCA) as a preprocessing stage with Restricted Boltzmann Machines (RBMs) offers a complementary solution, wherein PCA condenses the feature space into a lower-dimensional representation, thereby improving training efficiency and enabling RBMs to capture complex nonlinear dependencies with enhanced convergence and generalization. While this combination can theoretically exploit the data's linear and nonlinear characteristics, conventional PCA-RBM frameworks often struggle to retain essential local manifold structures, limiting their effectiveness in capturing the full complexity of real-world datasets. This study addresses these challenges by proposing a novel hybrid dimensionality reduction framework that integrates PCA's global linear projection capabilities with RBMs' nonlinear feature learning strengths through an adaptive graph regularization mechanism that preserves critical local manifold properties, which address the limitations of conventional PCA-RBM combinations. The adaptive regularization mechanism ensures that proximate data points in input space retain similarity in the reduced feature space, effectively bridging global and local structure preservation. Compared to conventional methods, experimental validation demonstrates superior performance across multiple evaluation metrics, including data reduction efficiency, classification accuracy, precision, recall, and F-score. The framework addresses critical limitations in high-dimensional data processing while maintaining model performance, establishing a methodologically significant contribution to dimensionality reduction techniques applicable across scientific disciplines handling complex IoT-generated datasets. Our findings indicate that dimensionality reduction constitutes a viable and efficacious approach to simplifying datasets without significantly compromising performance.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"34 ","pages":"Article 101764"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S254266052500277X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The proliferation of Internet of Things (IoT) applications generates high-dimensional datasets characterized by substantial velocity, variety, and complexity, imposing severe computational constraints on machine learning systems. Such data's high dimensionality complicates identifying meaningful correlations among features. Thus, high-dimensional datasets pose substantial challenges for machine learning, as the abundance of variables tends to obscure meaningful correlations and hinder practical data analysis, particularly regarding computational resource consumption (e.g., memory usage), processing time, and machine learning models' training efficiency and performance. Dimensionality reduction techniques address these challenges by decreasing the number of input variables and preserving the intrinsic structure of the data while alleviating computational burdens. Nevertheless, most contemporary methods are optimized for either linear or nonlinear data patterns, but rarely both. Hybrid strategies integrating linear and nonlinear reduction techniques have increasingly addressed these constraints. Specifically, the combination of Principal Component Analysis (PCA) as a preprocessing stage with Restricted Boltzmann Machines (RBMs) offers a complementary solution, wherein PCA condenses the feature space into a lower-dimensional representation, thereby improving training efficiency and enabling RBMs to capture complex nonlinear dependencies with enhanced convergence and generalization. While this combination can theoretically exploit the data's linear and nonlinear characteristics, conventional PCA-RBM frameworks often struggle to retain essential local manifold structures, limiting their effectiveness in capturing the full complexity of real-world datasets. This study addresses these challenges by proposing a novel hybrid dimensionality reduction framework that integrates PCA's global linear projection capabilities with RBMs' nonlinear feature learning strengths through an adaptive graph regularization mechanism that preserves critical local manifold properties, which address the limitations of conventional PCA-RBM combinations. The adaptive regularization mechanism ensures that proximate data points in input space retain similarity in the reduced feature space, effectively bridging global and local structure preservation. Compared to conventional methods, experimental validation demonstrates superior performance across multiple evaluation metrics, including data reduction efficiency, classification accuracy, precision, recall, and F-score. The framework addresses critical limitations in high-dimensional data processing while maintaining model performance, establishing a methodologically significant contribution to dimensionality reduction techniques applicable across scientific disciplines handling complex IoT-generated datasets. Our findings indicate that dimensionality reduction constitutes a viable and efficacious approach to simplifying datasets without significantly compromising performance.
期刊介绍:
Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT.
The journal will place a high priority on timely publication, and provide a home for high quality.
Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.