基于可靠性和性能标准的大数据集快速分类算法论证

Інфокомунікаційні та комп’ютерні технології Pub Date : 2023-01-01 DOI:10.36994/2788-5518-2023-01-05-16

Nick Odegov, Matin Hadzhyiev, Liudmyla Bukata, Liudmyla Glazunova, Marina Kochetkova

{"title":"基于可靠性和性能标准的大数据集快速分类算法论证","authors":"Nick Odegov, Matin Hadzhyiev, Liudmyla Bukata, Liudmyla Glazunova, Marina Kochetkova","doi":"10.36994/2788-5518-2023-01-05-16","DOIUrl":null,"url":null,"abstract":"Classification methods are among the simplest and \"oldest\" methods of artificial intelligence. This article discusses fast algorithms that can be used to solve Big Data problems. Big Data is a case when the methods and tools used do not allow solving the problem in a pleasant time. Therefore, when solving this type of problem, an important criterion is the performance of classification algorithms. Productivity in this sense refers to potential decision-making time. This time depends on the constructive dimension of the algorithm - the number of typical operations for making a decision. In addition, the execution time of the program depends on the typical operations of the algorithm. The article considers productive algorithms based on M-means principles. At the same time, all possible representatives of classes are replaced by a small number of characteristics of these classes. In the simplest form, these algorithms boil down to the fact that classes are replaced by class centers based on the results of training. Such centers are defined as vectors of average values over all one-dimensional projections of the factor space. Distance measures from these centers in Euclidean and other metric spaces are used to classify unknown objects. Methods of rapid adaptation of these characteristics in learning algorithms are considered. The theoretical justification of the algorithms of adaptive metrics and adaptive rules is given. It is shown that M-means algorithms can be effective in the case that the number of class representatives is extremely large, and the number of classes and the dimension of the factor space are relatively small. The advantages and disadvantages of the considered algorithms are noted. The scope of their practical application is outlined. The principles of spreading these algorithms to a wide range of practical problems are also shown. An example of a comparison of a number of classification algorithms based on reliability and performance criteria is given. It was concluded that the algorithm of adaptive rules is the most effective for a significant number of problems.","PeriodicalId":165726,"journal":{"name":"Інфокомунікаційні та комп’ютерні технології","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"JUSTIFICATION OF FAST CLASSIFICATION ALGORITHMS ON BIG DATA SETS WITH RELIABILITY AND PERFORMANCE CRITERIA\",\"authors\":\"Nick Odegov, Matin Hadzhyiev, Liudmyla Bukata, Liudmyla Glazunova, Marina Kochetkova\",\"doi\":\"10.36994/2788-5518-2023-01-05-16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification methods are among the simplest and \\\"oldest\\\" methods of artificial intelligence. This article discusses fast algorithms that can be used to solve Big Data problems. Big Data is a case when the methods and tools used do not allow solving the problem in a pleasant time. Therefore, when solving this type of problem, an important criterion is the performance of classification algorithms. Productivity in this sense refers to potential decision-making time. This time depends on the constructive dimension of the algorithm - the number of typical operations for making a decision. In addition, the execution time of the program depends on the typical operations of the algorithm. The article considers productive algorithms based on M-means principles. At the same time, all possible representatives of classes are replaced by a small number of characteristics of these classes. In the simplest form, these algorithms boil down to the fact that classes are replaced by class centers based on the results of training. Such centers are defined as vectors of average values over all one-dimensional projections of the factor space. Distance measures from these centers in Euclidean and other metric spaces are used to classify unknown objects. Methods of rapid adaptation of these characteristics in learning algorithms are considered. The theoretical justification of the algorithms of adaptive metrics and adaptive rules is given. It is shown that M-means algorithms can be effective in the case that the number of class representatives is extremely large, and the number of classes and the dimension of the factor space are relatively small. The advantages and disadvantages of the considered algorithms are noted. The scope of their practical application is outlined. The principles of spreading these algorithms to a wide range of practical problems are also shown. An example of a comparison of a number of classification algorithms based on reliability and performance criteria is given. It was concluded that the algorithm of adaptive rules is the most effective for a significant number of problems.\",\"PeriodicalId\":165726,\"journal\":{\"name\":\"Інфокомунікаційні та комп’ютерні технології\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Інфокомунікаційні та комп’ютерні технології\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.36994/2788-5518-2023-01-05-16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Інфокомунікаційні та комп’ютерні технології","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36994/2788-5518-2023-01-05-16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

分类方法是人工智能最简单和“最古老”的方法之一。本文讨论了可用于解决大数据问题的快速算法。当使用的方法和工具不能在愉快的时间内解决问题时，大数据就是一个例子。因此，在解决这类问题时，分类算法的性能是一个重要的标准。在这个意义上，生产力指的是潜在的决策时间。这个时间取决于算法的构造维度——做出决定的典型操作的数量。此外，程序的执行时间取决于算法的典型操作。本文考虑基于m -均值原理的生产算法。同时，所有可能的类代表都被这些类的少数特征所取代。在最简单的形式中，这些算法可以归结为这样一个事实:基于训练结果，类被类中心取代。这些中心被定义为因子空间所有一维投影上的平均值向量。在欧几里得空间和其他度量空间中，距离这些中心的距离度量用于对未知物体进行分类。考虑了在学习算法中快速适应这些特征的方法。给出了自适应度量和自适应规则算法的理论依据。结果表明，M-means算法在类代表数量非常大，而类数量和因子空间维数相对较小的情况下是有效的。指出了所考虑的算法的优点和缺点。概述了它们的实际应用范围。本文还展示了将这些算法推广到广泛的实际问题中的原理。给出了基于可靠性和性能标准的几种分类算法的比较实例。结果表明，自适应规则算法对大量问题是最有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

JUSTIFICATION OF FAST CLASSIFICATION ALGORITHMS ON BIG DATA SETS WITH RELIABILITY AND PERFORMANCE CRITERIA

Classification methods are among the simplest and "oldest" methods of artificial intelligence. This article discusses fast algorithms that can be used to solve Big Data problems. Big Data is a case when the methods and tools used do not allow solving the problem in a pleasant time. Therefore, when solving this type of problem, an important criterion is the performance of classification algorithms. Productivity in this sense refers to potential decision-making time. This time depends on the constructive dimension of the algorithm - the number of typical operations for making a decision. In addition, the execution time of the program depends on the typical operations of the algorithm. The article considers productive algorithms based on M-means principles. At the same time, all possible representatives of classes are replaced by a small number of characteristics of these classes. In the simplest form, these algorithms boil down to the fact that classes are replaced by class centers based on the results of training. Such centers are defined as vectors of average values over all one-dimensional projections of the factor space. Distance measures from these centers in Euclidean and other metric spaces are used to classify unknown objects. Methods of rapid adaptation of these characteristics in learning algorithms are considered. The theoretical justification of the algorithms of adaptive metrics and adaptive rules is given. It is shown that M-means algorithms can be effective in the case that the number of class representatives is extremely large, and the number of classes and the dimension of the factor space are relatively small. The advantages and disadvantages of the considered algorithms are noted. The scope of their practical application is outlined. The principles of spreading these algorithms to a wide range of practical problems are also shown. An example of a comparison of a number of classification algorithms based on reliability and performance criteria is given. It was concluded that the algorithm of adaptive rules is the most effective for a significant number of problems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Інфокомунікаційні та комп’ютерні технології

自引率

0.00%

发文量