{"title":"服务器硬件测试软件套件的开发","authors":"E. Tsamtsurov, N. Balashov, K. Lukyanov","doi":"10.1134/S1547477125700980","DOIUrl":null,"url":null,"abstract":"<p>Testing of server equipment prior to its operation is crucial for ensuring reliable and smooth operation of systems at the Multifunctional Information and Computation Complex of the Joint Institute for Nuclear Research. The main purpose of testing is to identify hidden defects that may arise under critical loads on the equipment. There are various empirical methods described in production standards used to detect equipment failures. The paper presents an automated system for testing server equipment, including automation of system installation, launching tests, and collecting test logs. In the current implementation of the system, testing is carried out using the method of Highly Accelerated Stress Screening (HASS). A key part of the system is the monitoring subsystem required for collecting and analyzing temperature data from the tested components. Temperature metrics analysis during the testing phase allows to determine the duration of testing with a given accuracy. In addition to the monitoring tools such as Node Exporter, Prometheus, Prometheus Gateway and Grafana, the system uses Stress-ng to load the equipment with synthetic tests. All of these subsystems are freely distributed, and the proposed system can be easily implemented for similar testing in comparable infrastructures.</p>","PeriodicalId":730,"journal":{"name":"Physics of Particles and Nuclei Letters","volume":"22 5","pages":"1015 - 1018"},"PeriodicalIF":0.4000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of a Software Suite for Testing Server Hardware\",\"authors\":\"E. Tsamtsurov, N. Balashov, K. Lukyanov\",\"doi\":\"10.1134/S1547477125700980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Testing of server equipment prior to its operation is crucial for ensuring reliable and smooth operation of systems at the Multifunctional Information and Computation Complex of the Joint Institute for Nuclear Research. The main purpose of testing is to identify hidden defects that may arise under critical loads on the equipment. There are various empirical methods described in production standards used to detect equipment failures. The paper presents an automated system for testing server equipment, including automation of system installation, launching tests, and collecting test logs. In the current implementation of the system, testing is carried out using the method of Highly Accelerated Stress Screening (HASS). A key part of the system is the monitoring subsystem required for collecting and analyzing temperature data from the tested components. Temperature metrics analysis during the testing phase allows to determine the duration of testing with a given accuracy. In addition to the monitoring tools such as Node Exporter, Prometheus, Prometheus Gateway and Grafana, the system uses Stress-ng to load the equipment with synthetic tests. All of these subsystems are freely distributed, and the proposed system can be easily implemented for similar testing in comparable infrastructures.</p>\",\"PeriodicalId\":730,\"journal\":{\"name\":\"Physics of Particles and Nuclei Letters\",\"volume\":\"22 5\",\"pages\":\"1015 - 1018\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2025-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physics of Particles and Nuclei Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1134/S1547477125700980\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"PHYSICS, PARTICLES & FIELDS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics of Particles and Nuclei Letters","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1134/S1547477125700980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHYSICS, PARTICLES & FIELDS","Score":null,"Total":0}
Development of a Software Suite for Testing Server Hardware
Testing of server equipment prior to its operation is crucial for ensuring reliable and smooth operation of systems at the Multifunctional Information and Computation Complex of the Joint Institute for Nuclear Research. The main purpose of testing is to identify hidden defects that may arise under critical loads on the equipment. There are various empirical methods described in production standards used to detect equipment failures. The paper presents an automated system for testing server equipment, including automation of system installation, launching tests, and collecting test logs. In the current implementation of the system, testing is carried out using the method of Highly Accelerated Stress Screening (HASS). A key part of the system is the monitoring subsystem required for collecting and analyzing temperature data from the tested components. Temperature metrics analysis during the testing phase allows to determine the duration of testing with a given accuracy. In addition to the monitoring tools such as Node Exporter, Prometheus, Prometheus Gateway and Grafana, the system uses Stress-ng to load the equipment with synthetic tests. All of these subsystems are freely distributed, and the proposed system can be easily implemented for similar testing in comparable infrastructures.
期刊介绍:
The journal Physics of Particles and Nuclei Letters, brief name Particles and Nuclei Letters, publishes the articles with results of the original theoretical, experimental, scientific-technical, methodological and applied research. Subject matter of articles covers: theoretical physics, elementary particle physics, relativistic nuclear physics, nuclear physics and related problems in other branches of physics, neutron physics, condensed matter physics, physics and engineering at low temperatures, physics and engineering of accelerators, physical experimental instruments and methods, physical computation experiments, applied research in these branches of physics and radiology, ecology and nuclear medicine.