Data Validation Algorithm Based on Vector Rank Analysis

Informacionnye Tehnologii Pub Date : 2024-04-24 DOI:10.17587/it.30.198-205

O. R. Kivchun

{"title":"Data Validation Algorithm Based on Vector Rank Analysis","authors":"O. R. Kivchun","doi":"10.17587/it.30.198-205","DOIUrl":null,"url":null,"abstract":"In the process of solving the problem of managing large technical systems, the data obtained from various measuring devices are processed by known methods. On the basis of their analysis, acceptable solutions are formed, and as a result of the choice, the best is made. Some of the data are parametrized and are stochastic, i.e. they are random variables. However, the information for making management decisions must be strictly deterministic. Therefore, the main task of stochastic data processing is to obtain deterministic invariants suitable for use as information in the decision-making process. The article presents an algorithm for verifying data that allows you to determine which type they belong to: Gaussian or non-Gaussian. The results of this test will make it possible to make the right choice of mathematical apparatus for obtaining deterministic invariants. The scientific novelty of the algorithm lies in the fact that the mathematical apparatus of the algorithm is developed within the framework of vector rank analysis. Its essence lies in the fact that a sample is made from the \"general population\" of available data, on which the average and standard are determined. Then a part of the data taken from the \"general population\" is added to this sample, and the average and standard are determined again. Such a procedure for increasing the sample continues until the \"general population\" is completely exhausted. Next, the normalized dependence of the mean and standard values on the sample size is constructed. At the same time, if the dependence has a pronounced tendency to stabilize, then the data belong to the Gaussian type. In another case, they are considered non-Gaussian. The efficiency of the algorithm has been confirmed in the framework of studies of a significant number of samples of data on the power consumption of various large technical systems.","PeriodicalId":504905,"journal":{"name":"Informacionnye Tehnologii","volume":"55 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informacionnye Tehnologii","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17587/it.30.198-205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the process of solving the problem of managing large technical systems, the data obtained from various measuring devices are processed by known methods. On the basis of their analysis, acceptable solutions are formed, and as a result of the choice, the best is made. Some of the data are parametrized and are stochastic, i.e. they are random variables. However, the information for making management decisions must be strictly deterministic. Therefore, the main task of stochastic data processing is to obtain deterministic invariants suitable for use as information in the decision-making process. The article presents an algorithm for verifying data that allows you to determine which type they belong to: Gaussian or non-Gaussian. The results of this test will make it possible to make the right choice of mathematical apparatus for obtaining deterministic invariants. The scientific novelty of the algorithm lies in the fact that the mathematical apparatus of the algorithm is developed within the framework of vector rank analysis. Its essence lies in the fact that a sample is made from the "general population" of available data, on which the average and standard are determined. Then a part of the data taken from the "general population" is added to this sample, and the average and standard are determined again. Such a procedure for increasing the sample continues until the "general population" is completely exhausted. Next, the normalized dependence of the mean and standard values on the sample size is constructed. At the same time, if the dependence has a pronounced tendency to stabilize, then the data belong to the Gaussian type. In another case, they are considered non-Gaussian. The efficiency of the algorithm has been confirmed in the framework of studies of a significant number of samples of data on the power consumption of various large technical systems.

查看原文本刊更多论文

基于矢量等级分析的数据验证算法

在解决大型技术系统管理问题的过程中，要通过已知的方法处理从各种测量设备获得的数据。在对这些数据进行分析的基础上，形成可接受的解决方案，并在此基础上做出最佳选择。有些数据是参数化的，是随机的，即随机变量。然而，用于管理决策的信息必须是严格确定的。因此，随机数据处理的主要任务是获得适合在决策过程中用作信息的确定不变式。文章介绍了一种验证数据的算法，通过该算法可以确定数据属于哪种类型：高斯或非高斯。检验结果将有助于正确选择用于获取确定性不变式的数学仪器。该算法的科学新颖性在于，该算法的数学装置是在向量秩分析框架内开发的。其本质在于，从现有数据的 "总体 "中抽取样本，并在此基础上确定平均值和标准值。然后从 "总体 "中抽取一部分数据加入样本，再次确定平均值和标准值。这样一个增加样本的过程一直持续到 "总体 "完全耗尽为止。然后，构建平均值和标准值对样本量的归一化依赖关系。同时，如果该依赖关系有明显的稳定趋势，则数据属于高斯类型。在另一种情况下，数据则被视为非高斯类型。在对各种大型技术系统功耗的大量数据样本进行研究的框架下，该算法的效率得到了证实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Informacionnye Tehnologii

自引率

0.00%

发文量