Enhancing big data analysis in IoT applications and optimizing the performance of machine learning models using hybrid dimensionality optimization approach

IF 7.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Ihab Nassra, Juan V. Capella
{"title":"Enhancing big data analysis in IoT applications and optimizing the performance of machine learning models using hybrid dimensionality optimization approach","authors":"Ihab Nassra,&nbsp;Juan V. Capella","doi":"10.1016/j.iot.2025.101764","DOIUrl":null,"url":null,"abstract":"<div><div>The proliferation of Internet of Things (IoT) applications generates high-dimensional datasets characterized by substantial velocity, variety, and complexity, imposing severe computational constraints on machine learning systems. Such data's high dimensionality complicates identifying meaningful correlations among features. Thus, high-dimensional datasets pose substantial challenges for machine learning, as the abundance of variables tends to obscure meaningful correlations and hinder practical data analysis, particularly regarding computational resource consumption (e.g., memory usage), processing time, and machine learning models' training efficiency and performance. Dimensionality reduction techniques address these challenges by decreasing the number of input variables and preserving the intrinsic structure of the data while alleviating computational burdens. Nevertheless, most contemporary methods are optimized for either linear or nonlinear data patterns, but rarely both. Hybrid strategies integrating linear and nonlinear reduction techniques have increasingly addressed these constraints. Specifically, the combination of Principal Component Analysis (PCA) as a preprocessing stage with Restricted Boltzmann Machines (RBMs) offers a complementary solution, wherein PCA condenses the feature space into a lower-dimensional representation, thereby improving training efficiency and enabling RBMs to capture complex nonlinear dependencies with enhanced convergence and generalization. While this combination can theoretically exploit the data's linear and nonlinear characteristics, conventional PCA-RBM frameworks often struggle to retain essential local manifold structures, limiting their effectiveness in capturing the full complexity of real-world datasets. This study addresses these challenges by proposing a novel hybrid dimensionality reduction framework that integrates PCA's global linear projection capabilities with RBMs' nonlinear feature learning strengths through an adaptive graph regularization mechanism that preserves critical local manifold properties, which address the limitations of conventional PCA-RBM combinations. The adaptive regularization mechanism ensures that proximate data points in input space retain similarity in the reduced feature space, effectively bridging global and local structure preservation. Compared to conventional methods, experimental validation demonstrates superior performance across multiple evaluation metrics, including data reduction efficiency, classification accuracy, precision, recall, and F-score. The framework addresses critical limitations in high-dimensional data processing while maintaining model performance, establishing a methodologically significant contribution to dimensionality reduction techniques applicable across scientific disciplines handling complex IoT-generated datasets. Our findings indicate that dimensionality reduction constitutes a viable and efficacious approach to simplifying datasets without significantly compromising performance.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"34 ","pages":"Article 101764"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S254266052500277X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The proliferation of Internet of Things (IoT) applications generates high-dimensional datasets characterized by substantial velocity, variety, and complexity, imposing severe computational constraints on machine learning systems. Such data's high dimensionality complicates identifying meaningful correlations among features. Thus, high-dimensional datasets pose substantial challenges for machine learning, as the abundance of variables tends to obscure meaningful correlations and hinder practical data analysis, particularly regarding computational resource consumption (e.g., memory usage), processing time, and machine learning models' training efficiency and performance. Dimensionality reduction techniques address these challenges by decreasing the number of input variables and preserving the intrinsic structure of the data while alleviating computational burdens. Nevertheless, most contemporary methods are optimized for either linear or nonlinear data patterns, but rarely both. Hybrid strategies integrating linear and nonlinear reduction techniques have increasingly addressed these constraints. Specifically, the combination of Principal Component Analysis (PCA) as a preprocessing stage with Restricted Boltzmann Machines (RBMs) offers a complementary solution, wherein PCA condenses the feature space into a lower-dimensional representation, thereby improving training efficiency and enabling RBMs to capture complex nonlinear dependencies with enhanced convergence and generalization. While this combination can theoretically exploit the data's linear and nonlinear characteristics, conventional PCA-RBM frameworks often struggle to retain essential local manifold structures, limiting their effectiveness in capturing the full complexity of real-world datasets. This study addresses these challenges by proposing a novel hybrid dimensionality reduction framework that integrates PCA's global linear projection capabilities with RBMs' nonlinear feature learning strengths through an adaptive graph regularization mechanism that preserves critical local manifold properties, which address the limitations of conventional PCA-RBM combinations. The adaptive regularization mechanism ensures that proximate data points in input space retain similarity in the reduced feature space, effectively bridging global and local structure preservation. Compared to conventional methods, experimental validation demonstrates superior performance across multiple evaluation metrics, including data reduction efficiency, classification accuracy, precision, recall, and F-score. The framework addresses critical limitations in high-dimensional data processing while maintaining model performance, establishing a methodologically significant contribution to dimensionality reduction techniques applicable across scientific disciplines handling complex IoT-generated datasets. Our findings indicate that dimensionality reduction constitutes a viable and efficacious approach to simplifying datasets without significantly compromising performance.
加强物联网应用中的大数据分析,并使用混合维数优化方法优化机器学习模型的性能
物联网(IoT)应用的激增产生了高维数据集,其特点是速度快、种类多、复杂性高,对机器学习系统施加了严格的计算限制。此类数据的高维性使识别特征之间有意义的相关性变得复杂。因此,高维数据集给机器学习带来了巨大的挑战,因为大量的变量往往会模糊有意义的相关性,阻碍实际的数据分析,特别是在计算资源消耗(例如,内存使用)、处理时间和机器学习模型的训练效率和性能方面。降维技术通过减少输入变量的数量和保留数据的内在结构来解决这些挑战,同时减轻了计算负担。然而,大多数当代方法都针对线性或非线性数据模式进行了优化,但很少同时针对这两种模式进行优化。集成线性和非线性约简技术的混合策略越来越多地解决了这些限制。具体而言,将主成分分析(PCA)作为预处理阶段与受限玻尔兹曼机(rbm)相结合提供了一种互补的解决方案,其中PCA将特征空间压缩为低维表示,从而提高了训练效率,使rbm能够捕获复杂的非线性依赖关系,并具有增强的收敛性和泛化性。虽然这种组合在理论上可以利用数据的线性和非线性特征,但传统的PCA-RBM框架往往难以保留基本的局部流形结构,从而限制了它们在捕获真实世界数据集的全部复杂性方面的有效性。本研究通过提出一种新的混合降维框架来解决这些挑战,该框架通过自适应图正则化机制将PCA的全局线性投影能力与rbm的非线性特征学习优势集成在一起,该机制保留了关键的局部流形属性,从而解决了传统PCA- rbm组合的局限性。自适应正则化机制确保输入空间中的近似数据点在约简特征空间中保持相似性,有效地连接了全局和局部结构保存。与传统方法相比,实验验证表明该方法在多个评估指标上表现优异,包括数据约简效率、分类准确性、精度、召回率和f分数。该框架解决了高维数据处理的关键限制,同时保持了模型性能,为处理复杂物联网生成数据集的科学学科的降维技术建立了方法论上的重要贡献。我们的研究结果表明,降维构成了一种可行和有效的方法来简化数据集,而不会显著影响性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Internet of Things
Internet of Things Multiple-
CiteScore
3.60
自引率
5.10%
发文量
115
审稿时长
37 days
期刊介绍: Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信