Nick Odegov, Matin Hadzhyiev, Liudmyla Bukata, Liudmyla Glazunova, Marina Kochetkova
{"title":"基于可靠性和性能标准的大数据集快速分类算法论证","authors":"Nick Odegov, Matin Hadzhyiev, Liudmyla Bukata, Liudmyla Glazunova, Marina Kochetkova","doi":"10.36994/2788-5518-2023-01-05-16","DOIUrl":null,"url":null,"abstract":"Classification methods are among the simplest and \"oldest\" methods of artificial intelligence. This article discusses fast algorithms that can be used to solve Big Data problems. Big Data is a case when the methods and tools used do not allow solving the problem in a pleasant time. Therefore, when solving this type of problem, an important criterion is the performance of classification algorithms. Productivity in this sense refers to potential decision-making time. This time depends on the constructive dimension of the algorithm - the number of typical operations for making a decision. In addition, the execution time of the program depends on the typical operations of the algorithm. The article considers productive algorithms based on M-means principles. At the same time, all possible representatives of classes are replaced by a small number of characteristics of these classes. In the simplest form, these algorithms boil down to the fact that classes are replaced by class centers based on the results of training. Such centers are defined as vectors of average values over all one-dimensional projections of the factor space. Distance measures from these centers in Euclidean and other metric spaces are used to classify unknown objects. Methods of rapid adaptation of these characteristics in learning algorithms are considered. The theoretical justification of the algorithms of adaptive metrics and adaptive rules is given. It is shown that M-means algorithms can be effective in the case that the number of class representatives is extremely large, and the number of classes and the dimension of the factor space are relatively small. The advantages and disadvantages of the considered algorithms are noted. The scope of their practical application is outlined. The principles of spreading these algorithms to a wide range of practical problems are also shown. An example of a comparison of a number of classification algorithms based on reliability and performance criteria is given. It was concluded that the algorithm of adaptive rules is the most effective for a significant number of problems.","PeriodicalId":165726,"journal":{"name":"Інфокомунікаційні та комп’ютерні технології","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"JUSTIFICATION OF FAST CLASSIFICATION ALGORITHMS ON BIG DATA SETS WITH RELIABILITY AND PERFORMANCE CRITERIA\",\"authors\":\"Nick Odegov, Matin Hadzhyiev, Liudmyla Bukata, Liudmyla Glazunova, Marina Kochetkova\",\"doi\":\"10.36994/2788-5518-2023-01-05-16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification methods are among the simplest and \\\"oldest\\\" methods of artificial intelligence. This article discusses fast algorithms that can be used to solve Big Data problems. Big Data is a case when the methods and tools used do not allow solving the problem in a pleasant time. Therefore, when solving this type of problem, an important criterion is the performance of classification algorithms. Productivity in this sense refers to potential decision-making time. This time depends on the constructive dimension of the algorithm - the number of typical operations for making a decision. In addition, the execution time of the program depends on the typical operations of the algorithm. The article considers productive algorithms based on M-means principles. At the same time, all possible representatives of classes are replaced by a small number of characteristics of these classes. In the simplest form, these algorithms boil down to the fact that classes are replaced by class centers based on the results of training. Such centers are defined as vectors of average values over all one-dimensional projections of the factor space. Distance measures from these centers in Euclidean and other metric spaces are used to classify unknown objects. Methods of rapid adaptation of these characteristics in learning algorithms are considered. The theoretical justification of the algorithms of adaptive metrics and adaptive rules is given. It is shown that M-means algorithms can be effective in the case that the number of class representatives is extremely large, and the number of classes and the dimension of the factor space are relatively small. The advantages and disadvantages of the considered algorithms are noted. The scope of their practical application is outlined. The principles of spreading these algorithms to a wide range of practical problems are also shown. An example of a comparison of a number of classification algorithms based on reliability and performance criteria is given. It was concluded that the algorithm of adaptive rules is the most effective for a significant number of problems.\",\"PeriodicalId\":165726,\"journal\":{\"name\":\"Інфокомунікаційні та комп’ютерні технології\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Інфокомунікаційні та комп’ютерні технології\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.36994/2788-5518-2023-01-05-16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Інфокомунікаційні та комп’ютерні технології","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36994/2788-5518-2023-01-05-16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
JUSTIFICATION OF FAST CLASSIFICATION ALGORITHMS ON BIG DATA SETS WITH RELIABILITY AND PERFORMANCE CRITERIA
Classification methods are among the simplest and "oldest" methods of artificial intelligence. This article discusses fast algorithms that can be used to solve Big Data problems. Big Data is a case when the methods and tools used do not allow solving the problem in a pleasant time. Therefore, when solving this type of problem, an important criterion is the performance of classification algorithms. Productivity in this sense refers to potential decision-making time. This time depends on the constructive dimension of the algorithm - the number of typical operations for making a decision. In addition, the execution time of the program depends on the typical operations of the algorithm. The article considers productive algorithms based on M-means principles. At the same time, all possible representatives of classes are replaced by a small number of characteristics of these classes. In the simplest form, these algorithms boil down to the fact that classes are replaced by class centers based on the results of training. Such centers are defined as vectors of average values over all one-dimensional projections of the factor space. Distance measures from these centers in Euclidean and other metric spaces are used to classify unknown objects. Methods of rapid adaptation of these characteristics in learning algorithms are considered. The theoretical justification of the algorithms of adaptive metrics and adaptive rules is given. It is shown that M-means algorithms can be effective in the case that the number of class representatives is extremely large, and the number of classes and the dimension of the factor space are relatively small. The advantages and disadvantages of the considered algorithms are noted. The scope of their practical application is outlined. The principles of spreading these algorithms to a wide range of practical problems are also shown. An example of a comparison of a number of classification algorithms based on reliability and performance criteria is given. It was concluded that the algorithm of adaptive rules is the most effective for a significant number of problems.