Ensemble model with combined feature set for Big data classification in IoT scenario

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2025-04-17 DOI:10.1016/j.datak.2025.102447

Harivardhagini S (Professor) , Pranavanand S (Associate Professor) , Raghuram A (Professor)

{"title":"Ensemble model with combined feature set for Big data classification in IoT scenario","authors":"Harivardhagini S (Professor) , Pranavanand S (Associate Professor) , Raghuram A (Professor)","doi":"10.1016/j.datak.2025.102447","DOIUrl":null,"url":null,"abstract":"<div><div>Sensor nodes that are wirelessly connected to the internet and several systems make up the Internet of Things system. Large volumes of data are often stored in big data, which complicates the classification process. There are many Big data classification strategies in use, but the main issues are the management of secure information as well as computational time. This paper's goal is to suggest a novel classification system for big data in Internet of Things networks that operates in four main phases. Particularly, the healthcare data is considered as the Big data perspective to solve the classification problem. Since the healthcare Big data is the revolutionary tool in this industry, it is becoming the most vital point of patient-centric care. Different data sources are aggregated in this Big data healthcare ecosystem. The first stage is data acquisition which takes place via Internet of Things through sensors. The second stage is improved DSig normalization for input data preprocessing. The third stage is MapReduce framework-based feature extraction for handling the Big data. This extract features like raw data, mutual information, information gain, and improved Renyi entropy. Finally, the fourth stage is an ensemble disease classification model by the combination of Recurrent Neural Network, Neural Network, and Improved Support Vector Machine for predicting normal and abnormal diseases. The suggested work is implemented by the Python tool, and the effectiveness, specificity, sensitivity, precision, and other factors of the results are assessed. The proposed ensemble model achieves superior precision of 0.9573 for the training rate of 90 % when compared to the traditional models.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102447"},"PeriodicalIF":2.7000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000424","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Sensor nodes that are wirelessly connected to the internet and several systems make up the Internet of Things system. Large volumes of data are often stored in big data, which complicates the classification process. There are many Big data classification strategies in use, but the main issues are the management of secure information as well as computational time. This paper's goal is to suggest a novel classification system for big data in Internet of Things networks that operates in four main phases. Particularly, the healthcare data is considered as the Big data perspective to solve the classification problem. Since the healthcare Big data is the revolutionary tool in this industry, it is becoming the most vital point of patient-centric care. Different data sources are aggregated in this Big data healthcare ecosystem. The first stage is data acquisition which takes place via Internet of Things through sensors. The second stage is improved DSig normalization for input data preprocessing. The third stage is MapReduce framework-based feature extraction for handling the Big data. This extract features like raw data, mutual information, information gain, and improved Renyi entropy. Finally, the fourth stage is an ensemble disease classification model by the combination of Recurrent Neural Network, Neural Network, and Improved Support Vector Machine for predicting normal and abnormal diseases. The suggested work is implemented by the Python tool, and the effectiveness, specificity, sensitivity, precision, and other factors of the results are assessed. The proposed ensemble model achieves superior precision of 0.9573 for the training rate of 90 % when compared to the traditional models.

查看原文本刊更多论文

基于组合特征集的物联网场景大数据分类集成模型

无线连接到互联网和多个系统的传感器节点组成了物联网系统。大量的数据通常存储在大数据中，这使得分类过程变得复杂。目前有许多大数据分类策略在使用中，但主要问题是安全信息的管理以及计算时间。本文的目标是为物联网网络中的大数据提出一种新的分类系统，该系统分为四个主要阶段。特别是将医疗数据作为大数据视角来解决分类问题。由于医疗大数据是这个行业的革命性工具，它正在成为以患者为中心的医疗的最重要的一点。不同的数据源聚集在这个大数据医疗生态系统中。第一阶段是通过传感器通过物联网进行数据采集。第二阶段是改进的DSig规范化输入数据预处理。第三阶段是基于MapReduce框架的大数据特征提取。该方法提取了原始数据、互信息、信息增益和改进的人义熵等特征。最后，第四阶段是将递归神经网络、神经网络和改进的支持向量机相结合的疾病集成分类模型，用于预测正常和异常疾病。建议的工作由Python工具实现，并评估结果的有效性、特异性、灵敏度、精度和其他因素。与传统模型相比，该集成模型的训练精度达到0.9573，训练率达到90%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.