{"title":"合成数据在信息安全领域异常检测问题中的应用","authors":"A. I. Gurianov","doi":"10.3103/S0005105525700128","DOIUrl":null,"url":null,"abstract":"<div><p>Synthetic data are highly relevant for machine learning. Modern algorithms to generate synthetic data make it possible to generate data that are very similar in their statistical properties to the original data. Synthetic data is used in practice in a wide range of tasks, including those related to data augmentation. The author of the article proposes a method of data augmentation combining the approaches of increasing the sample size using synthetic data and synthetic anomaly generation. This method has been used to address the information security problem of anomaly detection in server logs to detect attacks. The model trained for the task presents high results. This demonstrates the effectiveness of the use of synthetic data to increase sample size and generate anomalies, as well as the ability to use these approaches together with high efficiency.</p></div>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"58 2 supplement","pages":"S68 - S72"},"PeriodicalIF":0.5000,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Synthetic Data to the Problem of Anomaly Detection in the Field of Information Security\",\"authors\":\"A. I. Gurianov\",\"doi\":\"10.3103/S0005105525700128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Synthetic data are highly relevant for machine learning. Modern algorithms to generate synthetic data make it possible to generate data that are very similar in their statistical properties to the original data. Synthetic data is used in practice in a wide range of tasks, including those related to data augmentation. The author of the article proposes a method of data augmentation combining the approaches of increasing the sample size using synthetic data and synthetic anomaly generation. This method has been used to address the information security problem of anomaly detection in server logs to detect attacks. The model trained for the task presents high results. This demonstrates the effectiveness of the use of synthetic data to increase sample size and generate anomalies, as well as the ability to use these approaches together with high efficiency.</p></div>\",\"PeriodicalId\":42995,\"journal\":{\"name\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"volume\":\"58 2 supplement\",\"pages\":\"S68 - S72\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2025-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0005105525700128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105525700128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Application of Synthetic Data to the Problem of Anomaly Detection in the Field of Information Security
Synthetic data are highly relevant for machine learning. Modern algorithms to generate synthetic data make it possible to generate data that are very similar in their statistical properties to the original data. Synthetic data is used in practice in a wide range of tasks, including those related to data augmentation. The author of the article proposes a method of data augmentation combining the approaches of increasing the sample size using synthetic data and synthetic anomaly generation. This method has been used to address the information security problem of anomaly detection in server logs to detect attacks. The model trained for the task presents high results. This demonstrates the effectiveness of the use of synthetic data to increase sample size and generate anomalies, as well as the ability to use these approaches together with high efficiency.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.