Cloud-based real-time enhancement for disease prediction using Confluent Cloud, Apache Kafka, feature optimization, and explainable artificial intelligence.

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2025-06-04 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.2899

Abdulaziz AlMohimeed

{"title":"Cloud-based real-time enhancement for disease prediction using Confluent Cloud, Apache Kafka, feature optimization, and explainable artificial intelligence.","authors":"Abdulaziz AlMohimeed","doi":"10.7717/peerj-cs.2899","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, Internet of Things (IoT)-based technologies have advanced healthcare by facilitating the development of monitoring systems, subsequently generating an exponential amount of streaming data. This streaming data can be preprocessed and analyzed using technologies that integrate ensemble models, Explainable Artificial Intelligence (XAI), feature selection (FS) method and big data streaming processing platforms to develop predictive real-time systems. This integration adds new value to healthcare that helps organizations enhance clinical decision-making, improve patient care, and elevate the overall quality of healthcare. This article presents a real-time system for the early detection and treatment of chronic kidney disease (CKD) using a real-world simulation application. The real-time system is developed in two phases. The first phase aims to propose a stacking model, apply a genetic algorithm (GA) and Particle swarm optimization (PSO) as feature selection, and explore a stacking model with the best features with explainable artificial intelligence (XAI). The best model with the best-optimized features is used to develop the second phase. The results showed that stacking model with GA is achieved the hightest performance with 100 accuracy, 100 precision, 100 recall, and 100 F1-score. The second phase is designed based on Confluent Cloud, which offers several benefits for creating a real-time streaming system based on Apache Kafka, providing multiple APIs-the Producer API and Consumer API-for data producers and consumers, respectively. Python scripts are developed to pipeline streaming data. The first Python script to generate streaming health attributes that are pushed into a Kafka topic. A second Python script to consume health attributes from a Kafka topic and apply a stacking model to predict CKD in real-time. The results showed that the stacking model with features selected by GA recorded the best performance with 100 accuracy. The pipeline's streaming steps have validated our approach's effectiveness in real-time, leveraging Confluent Cloud and Apache Kafka.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2899"},"PeriodicalIF":3.5000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192947/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2899","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, Internet of Things (IoT)-based technologies have advanced healthcare by facilitating the development of monitoring systems, subsequently generating an exponential amount of streaming data. This streaming data can be preprocessed and analyzed using technologies that integrate ensemble models, Explainable Artificial Intelligence (XAI), feature selection (FS) method and big data streaming processing platforms to develop predictive real-time systems. This integration adds new value to healthcare that helps organizations enhance clinical decision-making, improve patient care, and elevate the overall quality of healthcare. This article presents a real-time system for the early detection and treatment of chronic kidney disease (CKD) using a real-world simulation application. The real-time system is developed in two phases. The first phase aims to propose a stacking model, apply a genetic algorithm (GA) and Particle swarm optimization (PSO) as feature selection, and explore a stacking model with the best features with explainable artificial intelligence (XAI). The best model with the best-optimized features is used to develop the second phase. The results showed that stacking model with GA is achieved the hightest performance with 100 accuracy, 100 precision, 100 recall, and 100 F1-score. The second phase is designed based on Confluent Cloud, which offers several benefits for creating a real-time streaming system based on Apache Kafka, providing multiple APIs-the Producer API and Consumer API-for data producers and consumers, respectively. Python scripts are developed to pipeline streaming data. The first Python script to generate streaming health attributes that are pushed into a Kafka topic. A second Python script to consume health attributes from a Kafka topic and apply a stacking model to predict CKD in real-time. The results showed that the stacking model with features selected by GA recorded the best performance with 100 accuracy. The pipeline's streaming steps have validated our approach's effectiveness in real-time, leveraging Confluent Cloud and Apache Kafka.

查看原文本刊更多论文

基于云的疾病预测实时增强，使用Confluent Cloud、Apache Kafka、功能优化和可解释的人工智能。

近年来，基于物联网（IoT）的技术通过促进监控系统的发展，进而产生指数级的流数据，从而推动了医疗保健的发展。这些流数据可以使用集成集成模型、可解释人工智能（XAI）、特征选择（FS）方法和大数据流处理平台的技术进行预处理和分析，以开发预测实时系统。这种集成为医疗保健增加了新的价值，可帮助组织加强临床决策、改善患者护理并提高医疗保健的整体质量。本文介绍了一个实时系统的早期检测和治疗慢性肾脏疾病（CKD）使用现实世界的模拟应用。实时系统的开发分为两个阶段。第一阶段提出堆叠模型，应用遗传算法（GA）和粒子群优化（PSO）作为特征选择，探索具有可解释人工智能（XAI）的最佳特征的堆叠模型。采用具有最佳优化特征的最佳模型进行第二阶段的开发。结果表明，采用遗传算法的叠加模型的准确率为100，精密度为100，召回率为100，f1分数为100。第二阶段是基于Confluent Cloud设计的，它为创建基于Apache Kafka的实时流系统提供了几个好处，分别为数据生产者和消费者提供了多个API——生产者API和消费者API。Python脚本是为管道流数据开发的。第一个生成推送到Kafka主题的流健康属性的Python脚本。第二个Python脚本从Kafka主题中消费健康属性，并应用堆栈模型实时预测CKD。结果表明，采用遗传算法选择的特征叠加模型的准确率达到100。管道的流化步骤验证了我们的方法在实时中的有效性，利用了Confluent Cloud和Apache Kafka。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.