Proceedings of the First Workshop on Machine Learning for Computing Systems最新文献

筛选
英文 中文
RACC ]
Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871.3217876
Saurav Nanda, T. Hacker
{"title":"RACC","authors":"Saurav Nanda, T. Hacker","doi":"10.1145/3217871.3217876","DOIUrl":"https://doi.org/10.1145/3217871.3217876","url":null,"abstract":"Resource optimization has always been a big challenge in modern data centers. The process of performing workload consolidation on a minimal number of physical machines is becoming more complex when these data centers began supporting containers in addition to virtual machines (VMs). With the increasing usage of containers with VMs in data centers, it becomes critical to address this problem from the container's point of view - that is to optimally allocate containers in the fewest number of physical hosts. Depending on the type of application workload or tasks, infrastructure providers may provision separate containers to handle each task. These tasks may have different resource demands, such as: some of these tasks are CPU intensive, some memory intensive, some I/O intensive and some may be network intensive. Also, the physical machines in the data center are heterogeneous i.e. the hardware configuration (resource capacity) of these machines may differ from each other. Hence, the challenge is to consolidate all the active containers with different resource requirements on the minimum number of physical machines that are not even. We formulate a multi-resource bin packing problem and propose a Deep Learning technique called Fit-for-Packing to place a near-optimal number of containers on a physical machine. Experimental results show that our model achieves an average training accuracy of 82.01% and an average testing accuracy of 82.93%.","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115591423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Diagnosing NFS errors: Preliminary Findings from a Syslog Analysis of Bridges 诊断NFS错误:从网桥的Syslog分析的初步发现
Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871.3217873
P. Choudhary, R. Sooriamurthi, J. R. Scott, Ed Hanna, J. Sommerfield, A. Kar
{"title":"Diagnosing NFS errors: Preliminary Findings from a Syslog Analysis of Bridges","authors":"P. Choudhary, R. Sooriamurthi, J. R. Scott, Ed Hanna, J. Sommerfield, A. Kar","doi":"10.1145/3217871.3217873","DOIUrl":"https://doi.org/10.1145/3217871.3217873","url":null,"abstract":"Bridges is the current main system at the Pittsburgh Supercomputing Center. Given the complexity of the system and the volume of its use, it is a very good environment for exploring the potential of machine learning techniques in studying sub-optimal performance. This short report discusses preliminary and ongoing work of a new graduate student exploring this novel realm. Our initial focus has been on learning to predict the occurrence of NFS time out errors from preceding syslog messages.","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129892686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal Relationships amongst Sensors in the Trinity Supercomputer: work in progress 三位一体超级计算机中传感器之间的因果关系:工作进展中
Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871.3217875
Ian Goetting, Elisabeth Baseman, H. Cao
{"title":"Causal Relationships amongst Sensors in the Trinity Supercomputer: work in progress","authors":"Ian Goetting, Elisabeth Baseman, H. Cao","doi":"10.1145/3217871.3217875","DOIUrl":"https://doi.org/10.1145/3217871.3217875","url":null,"abstract":"HPC systems are inherently complex, both to work with and to maintain. Trying to anticipate a sudden event, such as component failure or how the system will react to a newly installed module, is too large and convoluted of a problem for a single person or group of people to solve manually. In this paper, we attempt to explore the causal relationships present amongst sensors and monitoring data found in these kinds of machines. The intent of this study is to both better understand how different components and modules of the machines interact with each other, as well as get a better understanding of how a change in one part of the machine effects another part. To achieve this, we apply both a Bayesian network and logistic regression, in conjunction with a causal graph generator (TETRAD), on sensor data generated from the Trinity supercomputer in Los Alamos, NM. In particular, the data that was examined in this study focused on data from 4 slot-level sensors and 3 row-level sensors. It was found that, while these sensors do contain causal structure by themselves, they do not seem to makeup the entire causal structure, only a portion of it. The presence of latent variables, as well as possibly more interconnections (i.e. causal relationships) between each of the sensors, are likely having an effect on the predictive accuracy of the Bayesian network and logistic regression experiments conducted in this study. Therefore, it is recommended, for future work, that more experiments are conducted involving more sensors and possibly other relevant data.","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121502860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Work in Progress: Topic Modeling for HPC Job State Prediction 工作进展:HPC作业状态预测的主题建模
Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-06-12 DOI: 10.1145/3217871.3217874
Alexandra DeLucia, Elisabeth Baseman
{"title":"Work in Progress: Topic Modeling for HPC Job State Prediction","authors":"Alexandra DeLucia, Elisabeth Baseman","doi":"10.1145/3217871.3217874","DOIUrl":"https://doi.org/10.1145/3217871.3217874","url":null,"abstract":"As high performance computing approaches the exascale era, progress in automatic computer monitoring becomes increasingly important. Monitoring will no longer to able to rely only on human experts, due to the overwhelming amount of monitoring data, such as system logs, job logs, and temperature reports. Because a human analyst cannot keep up with terabytes of monitoring data per day, we turn to techniques from the statistical machine learning community to assist with analysis of monitoring data. Specifically, we use machine learning techniques predict compute job outcomes using features extracted from system log messages. Our preliminary results show that not only do statistical topics extracted from log messages provide a signal correlated with job outcome, but that the correlation is strong enough that two canonical classification algorithms can achieve very high predictive performance using only topic distributions and basic temporal information as features.","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"52 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121011853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Proceedings of the First Workshop on Machine Learning for Computing Systems 第一届计算机系统机器学习研讨会论文集
{"title":"Proceedings of the First Workshop on Machine Learning for Computing Systems","authors":"","doi":"10.1145/3217871","DOIUrl":"https://doi.org/10.1145/3217871","url":null,"abstract":"","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133323494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection 可解释系统日志异常检测的递归神经网络注意机制
Proceedings of the First Workshop on Machine Learning for Computing Systems Pub Date : 2018-03-13 DOI: 10.1145/3217871.3217872
Andy Brown, Aaron Tuor, Brian Hutchinson, Nicole Nichols
{"title":"Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection","authors":"Andy Brown, Aaron Tuor, Brian Hutchinson, Nicole Nichols","doi":"10.1145/3217871.3217872","DOIUrl":"https://doi.org/10.1145/3217871.3217872","url":null,"abstract":"Deep learning has recently demonstrated state-of-the art performance on key tasks related to the maintenance of computer systems, such as intrusion detection, denial of service attack detection, hardware and software system failures, and malware detection. In these contexts, model interpretability is vital for administrator and analyst to trust and act on the automated analysis of machine learning models. Deep learning methods have been criticized as black box oracles which allow limited insight into decision factors. In this work we seek to bridge the gap between the impressive performance of deep learning models and the need for interpretable model introspection. To this end we present recurrent neural network (RNN) language models augmented with attention for anomaly detection in system logs. Our methods are generally applicable to any computer system and logging source. By incorporating attention variants into our RNN language models we create opportunities for model introspection and analysis without sacrificing state-of-the art performance. We demonstrate model performance and illustrate model interpretability on an intrusion detection task using the Los Alamos National Laboratory (LANL) cyber security dataset, reporting upward of 0.99 area under the receiver operator characteristic curve despite being trained only on a single day's worth of data.","PeriodicalId":174025,"journal":{"name":"Proceedings of the First Workshop on Machine Learning for Computing Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127819799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 151
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信