Distributed Bayesian Inference for Large-Scale IoT Systems

IF 4.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing Pub Date : 2023-12-19 DOI:10.3390/bdcc8010001

Eleni Vlachou, Aristeidis Karras, Christos N. Karras, Leonidas Theodorakopoulos, C. Halkiopoulos, S. Sioutas

{"title":"Distributed Bayesian Inference for Large-Scale IoT Systems","authors":"Eleni Vlachou, Aristeidis Karras, Christos N. Karras, Leonidas Theodorakopoulos, C. Halkiopoulos, S. Sioutas","doi":"10.3390/bdcc8010001","DOIUrl":null,"url":null,"abstract":"In this work, we present a Distributed Bayesian Inference Classifier for Large-Scale Systems, where we assess its performance and scalability on distributed environments such as PySpark. The presented classifier consistently showcases efficient inference time, irrespective of the variations in the size of the test set, implying a robust ability to handle escalating data sizes without a proportional increase in computational demands. Notably, throughout the experiments, there is an observed increase in memory usage with growing test set sizes, this increment is sublinear, demonstrating the proficiency of the classifier in memory resource management. This behavior is consistent with the typical tendencies of PySpark tasks, which witness increasing memory consumption due to data partitioning and various data operations as datasets expand. CPU resource utilization, which is another crucial factor, also remains stable, emphasizing the capability of the classifier to manage larger computational workloads without significant resource strain. From a classification perspective, the Bayesian Logistic Regression Spark Classifier consistently achieves reliable performance metrics, with a particular focus on high specificity, indicating its aptness for applications where pinpointing true negatives is crucial. In summary, based on all experiments conducted under various data sizes, our classifier emerges as a top contender for scalability-driven applications in IoT systems, highlighting its dependable performance, adept resource management, and consistent prediction accuracy.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" 7","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc8010001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In this work, we present a Distributed Bayesian Inference Classifier for Large-Scale Systems, where we assess its performance and scalability on distributed environments such as PySpark. The presented classifier consistently showcases efficient inference time, irrespective of the variations in the size of the test set, implying a robust ability to handle escalating data sizes without a proportional increase in computational demands. Notably, throughout the experiments, there is an observed increase in memory usage with growing test set sizes, this increment is sublinear, demonstrating the proficiency of the classifier in memory resource management. This behavior is consistent with the typical tendencies of PySpark tasks, which witness increasing memory consumption due to data partitioning and various data operations as datasets expand. CPU resource utilization, which is another crucial factor, also remains stable, emphasizing the capability of the classifier to manage larger computational workloads without significant resource strain. From a classification perspective, the Bayesian Logistic Regression Spark Classifier consistently achieves reliable performance metrics, with a particular focus on high specificity, indicating its aptness for applications where pinpointing true negatives is crucial. In summary, based on all experiments conducted under various data sizes, our classifier emerges as a top contender for scalability-driven applications in IoT systems, highlighting its dependable performance, adept resource management, and consistent prediction accuracy.

查看原文本刊更多论文

大规模物联网系统的分布式贝叶斯推理

在这项工作中，我们介绍了一种适用于大规模系统的分布式贝叶斯推理分类器，并评估了它在 PySpark 等分布式环境中的性能和可扩展性。无论测试集的规模如何变化，所提出的分类器都能始终如一地显示出高效的推理时间，这意味着该分类器具有强大的能力来处理不断升级的数据规模，而不会相应增加计算需求。值得注意的是，在整个实验过程中，观察到内存使用量随着测试集大小的增加而增加，但这种增加是亚线性的，这表明分类器在内存资源管理方面非常熟练。这种行为与 PySpark 任务的典型趋势一致，即随着数据集的扩大，数据分区和各种数据操作会导致内存消耗增加。作为另一个关键因素的 CPU 资源利用率也保持稳定，这突出表明分类器有能力管理更大的计算工作量，而不会造成明显的资源压力。从分类的角度来看，贝叶斯逻辑回归 Spark 分类器始终保持着可靠的性能指标，尤其是在高特异性方面，这表明它非常适合于精确定位真阴性的应用。总之，基于在各种数据规模下进行的所有实验，我们的分类器成为物联网系统中可扩展性驱动型应用的最佳竞争者，突出了其可靠的性能、出色的资源管理和一致的预测准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊