INTELLIGENT METHOD WITH THE REINFORCEMENT OF THE SYNTHESIS OF OPTIMAL PIPELINE OF THE DATA PRE-PROCESSING OPERATIONS IN THE MACHINE LEARNING PROBLEMS (Eng)

Scientific Works of Vinnytsia National Technical University Pub Date : 1900-01-01 DOI:10.31649/2307-5392-2022-4-15-24

M. V. Dratovanyi, V. B. Mokin

{"title":"INTELLIGENT METHOD WITH THE REINFORCEMENT OF THE SYNTHESIS OF OPTIMAL PIPELINE OF THE DATA PRE-PROCESSING OPERATIONS IN THE MACHINE LEARNING PROBLEMS (Eng)","authors":"M. V. Dratovanyi, V. B. Mokin","doi":"10.31649/2307-5392-2022-4-15-24","DOIUrl":null,"url":null,"abstract":"The paper is devoted to the synthesis and optimization of the pipelines of the data pre-processing operations in the problems of the machine learning models construction. It is noted that it is important to optimize the triad of these pipelines - select optimal sequence of the optimal operations with the optimal parameters. In this case, the change of even one element immediately influences the choice of all other elements and their parameters. In general case, there exists a great number of the admissible variants of such pipelines for each model of machine learning and input data (random values or time series) and, as a rule, there is no marked datasets of model training for the synthesis of such pipelines. The survey of the known approaches to the solution of such problems has been carried out, the conclusion that the best way is to formalize them as the problems of reinforcement machine learning has been substantiated t. Typical approaches to the formalization and intellectual methods of similar problems solution have been presented. It is noted that the solution of the problems with reinforcement, as a rule, is complicated due to large dimensionality of the possible sets of the types and subtypes of the operations with different parameters and has problems with the coincidence to really optimal value during limited time. That is why, several improvements, enabling to solve this problem at certain conditions, are suggested. First, it is suggested to allocate variable and constant sections of the pipeline of the data pre-processing operations. It is also suggested for different types of the machine learning models what operations should be referred to the first and last unchangeable links and what operations – to variable link and only to this link it is suggested to apply reinforcement learning. Secondly, the algorithm of the initial setting of RL-policy parameters depending on certain statistical and other characteristics of the input data is suggested. The proposed improvement of the method with the reinforcement of the synthesis of the optimal pipeline of the operations can be applied not only for pre-processing operations but for other problems with the","PeriodicalId":404659,"journal":{"name":"Scientific Works of Vinnytsia National Technical University","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Works of Vinnytsia National Technical University","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31649/2307-5392-2022-4-15-24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The paper is devoted to the synthesis and optimization of the pipelines of the data pre-processing operations in the problems of the machine learning models construction. It is noted that it is important to optimize the triad of these pipelines - select optimal sequence of the optimal operations with the optimal parameters. In this case, the change of even one element immediately influences the choice of all other elements and their parameters. In general case, there exists a great number of the admissible variants of such pipelines for each model of machine learning and input data (random values or time series) and, as a rule, there is no marked datasets of model training for the synthesis of such pipelines. The survey of the known approaches to the solution of such problems has been carried out, the conclusion that the best way is to formalize them as the problems of reinforcement machine learning has been substantiated t. Typical approaches to the formalization and intellectual methods of similar problems solution have been presented. It is noted that the solution of the problems with reinforcement, as a rule, is complicated due to large dimensionality of the possible sets of the types and subtypes of the operations with different parameters and has problems with the coincidence to really optimal value during limited time. That is why, several improvements, enabling to solve this problem at certain conditions, are suggested. First, it is suggested to allocate variable and constant sections of the pipeline of the data pre-processing operations. It is also suggested for different types of the machine learning models what operations should be referred to the first and last unchangeable links and what operations – to variable link and only to this link it is suggested to apply reinforcement learning. Secondly, the algorithm of the initial setting of RL-policy parameters depending on certain statistical and other characteristics of the input data is suggested. The proposed improvement of the method with the reinforcement of the synthesis of the optimal pipeline of the operations can be applied not only for pre-processing operations but for other problems with the

查看原文本刊更多论文

智能方法与强化综合优化流水线的数据预处理操作在机器学习问题中的应用(英文)

本文研究了机器学习模型构建问题中数据预处理操作流程的综合与优化。指出了优化这些管道的三联体的重要性，即选择具有最优参数的最优操作序列。在这种情况下，即使一个元素的变化也会立即影响到所有其他元素及其参数的选择。通常情况下，对于每个机器学习模型和输入数据(随机值或时间序列)，都存在大量可接受的管道变体，并且通常没有标记的模型训练数据集用于这些管道的合成。对解决这类问题的已知方法进行了调查，得出的结论是，最好的方法是将它们形式化为强化机器学习的问题，这一结论已经得到证实。已经提出了形式化的典型方法和类似问题解决的智能方法。需要注意的是，由于具有不同参数的操作的类型和子类型的可能集的维数很大，因此，带强化的问题的求解通常是复杂的，并且存在在有限时间内与真正最优值的重合问题。因此，提出了若干改进措施，以便在某些条件下解决这一问题。首先，建议在数据预处理操作的管道中分配变量段和常量段。对于不同类型的机器学习模型，建议哪些操作应该引用第一个和最后一个不可更改的链接，以及哪些操作-变量链接，建议仅对该链接应用强化学习。其次，提出了根据输入数据的某些统计特征和其他特征初始设置RL-policy参数的算法。本文提出的改进方法加强了对最优操作流程的综合，不仅可以应用于预处理操作，而且可以应用于其他问题

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific Works of Vinnytsia National Technical University

自引率

0.00%

发文量