Computing Resource Optimization for a Log Monitoring System

Thanin Srithai, V. C. Barroso, P. Phunchongharn
{"title":"Computing Resource Optimization for a Log Monitoring System","authors":"Thanin Srithai, V. C. Barroso, P. Phunchongharn","doi":"10.1109/ICKII55100.2022.9983580","DOIUrl":null,"url":null,"abstract":"A Large Ion Collider Experiment (ALICE) at the Large Hadron Collider (LHC) in the European Organization for Nuclear Research (CERN) laboratory was built to study heavy-ion collisions and the properties of the quark-gluon plasma. The Online and Offline (O2) software systems of the experiment generate a huge amount of log data that is used for monitoring to detect a potential system failure. Elasticsearch was selected as a log storage and search engine for the monitoring system. One of the main problems is how to allocate the computing resources for Elasticsearch while minimizing cost and satisfying performance thresholds, i.e., throughput). Moreover, lacking knowledge of the search engine's behavior makes it difficult to find the best configuration. The exhaustive search method is a potential approach for solving. However, it is not practical since it consumes a lot of time and computing resources. Due to the limited resources, Bayesian optimization is applied as a solution. The Bayesian method requires only a few samples to create a surrogate function that roughly represents the objective function, i.e., minimizing cost while satisfying the performance needs. Then, the method explores only the area where the optimal solution exists with a high probability. The results show that Bayesian optimization provides the optimal or near-optimal computing resource configuration for given benchmark experiments while requiring only about half of the evaluations compared to other methods, e.g., exhaustive search, regression, and machine learning. The impact of several acquisition functions and initial sample generators were studied in order to find the best solution. These insights can help system operators search for an optimal computing resource configuration quickly and efficiently.","PeriodicalId":352222,"journal":{"name":"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )","volume":"359 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKII55100.2022.9983580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A Large Ion Collider Experiment (ALICE) at the Large Hadron Collider (LHC) in the European Organization for Nuclear Research (CERN) laboratory was built to study heavy-ion collisions and the properties of the quark-gluon plasma. The Online and Offline (O2) software systems of the experiment generate a huge amount of log data that is used for monitoring to detect a potential system failure. Elasticsearch was selected as a log storage and search engine for the monitoring system. One of the main problems is how to allocate the computing resources for Elasticsearch while minimizing cost and satisfying performance thresholds, i.e., throughput). Moreover, lacking knowledge of the search engine's behavior makes it difficult to find the best configuration. The exhaustive search method is a potential approach for solving. However, it is not practical since it consumes a lot of time and computing resources. Due to the limited resources, Bayesian optimization is applied as a solution. The Bayesian method requires only a few samples to create a surrogate function that roughly represents the objective function, i.e., minimizing cost while satisfying the performance needs. Then, the method explores only the area where the optimal solution exists with a high probability. The results show that Bayesian optimization provides the optimal or near-optimal computing resource configuration for given benchmark experiments while requiring only about half of the evaluations compared to other methods, e.g., exhaustive search, regression, and machine learning. The impact of several acquisition functions and initial sample generators were studied in order to find the best solution. These insights can help system operators search for an optimal computing resource configuration quickly and efficiently.
日志监控系统的计算资源优化
在欧洲核子研究组织(CERN)实验室的大型强子对撞机(LHC)上建立了大型离子对撞机实验(ALICE),以研究重离子碰撞和夸克-胶子等离子体的性质。实验的Online和Offline (O2)软件系统会产生大量的日志数据,用于监测系统潜在的故障。监控系统选择Elasticsearch作为日志存储和搜索引擎。其中一个主要问题是如何为Elasticsearch分配计算资源,同时最小化成本和满足性能阈值(即吞吐量)。此外,缺乏对搜索引擎行为的了解使得很难找到最佳配置。穷举搜索法是一种潜在的求解方法。然而,它并不实用,因为它消耗了大量的时间和计算资源。由于资源有限,采用贝叶斯优化作为解决方案。贝叶斯方法只需要少量的样本就可以创建一个大致代表目标函数的代理函数,即在满足性能需求的同时最小化成本。然后,该方法只探索最优解高概率存在的区域。结果表明,贝叶斯优化为给定的基准实验提供了最优或接近最优的计算资源配置,而与其他方法(如穷举搜索、回归和机器学习)相比,只需要大约一半的评估。为了找到最优解,研究了不同采集函数和初始样本生成器的影响。这些见解可以帮助系统操作员快速有效地搜索最佳计算资源配置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信