基于无监督聚类改进多层次数据抽样处理模型的质量指标

Emerging Science Journal Pub Date : 2024-02-01 DOI:10.28991/esj-2024-08-01-025

Ilya S. Lebedev, M. Sukhoparov

{"title":"基于无监督聚类改进多层次数据抽样处理模型的质量指标","authors":"Ilya S. Lebedev, M. Sukhoparov","doi":"10.28991/esj-2024-08-01-025","DOIUrl":null,"url":null,"abstract":"This paper presents a solution for building and implementing data processing models and experimentally evaluates new possibilities for improving ensemble methods based on multilevel data processing models. This study proposes a model to reduce the cost of retraining models when transforming data properties. The research objective is to improve the quality indicators of machine learning models when solving classification problems. The novelty is a method that uses a multilevel architecture of data processing models to determine the current data properties in segments at different levels and assign algorithms with the best quality indicators. This method differs from the known ones by using several model levels that analyze data properties and assign the best models to individual segments of data and training. The improvement consists of using unsupervised clustering of data samples. The resulting clusters are separate subsamples for assigning the best machine-learning models and algorithms. Experimental values of quality indicators for different classifiers on the whole sample and different segments were obtained. The findings show that unsupervised clustering using multilevel models can significantly improve the quality indicators of “weak” classifiers. The quality indicators of individual classifiers improve when the number of data clusters is increased to a certain threshold. The results obtained are applicable to classification when developing models and machine learning methods. The proposed method improved the classification quality indicators by 2–9% due to segmentation and the assignment of models with the best quality indicators in individual segments. Doi: 10.28991/ESJ-2024-08-01-025 Full Text: PDF","PeriodicalId":502658,"journal":{"name":"Emerging Science Journal","volume":"1064 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving the Quality Indicators of Multilevel Data Sampling Processing Models Based on Unsupervised Clustering\",\"authors\":\"Ilya S. Lebedev, M. Sukhoparov\",\"doi\":\"10.28991/esj-2024-08-01-025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a solution for building and implementing data processing models and experimentally evaluates new possibilities for improving ensemble methods based on multilevel data processing models. This study proposes a model to reduce the cost of retraining models when transforming data properties. The research objective is to improve the quality indicators of machine learning models when solving classification problems. The novelty is a method that uses a multilevel architecture of data processing models to determine the current data properties in segments at different levels and assign algorithms with the best quality indicators. This method differs from the known ones by using several model levels that analyze data properties and assign the best models to individual segments of data and training. The improvement consists of using unsupervised clustering of data samples. The resulting clusters are separate subsamples for assigning the best machine-learning models and algorithms. Experimental values of quality indicators for different classifiers on the whole sample and different segments were obtained. The findings show that unsupervised clustering using multilevel models can significantly improve the quality indicators of “weak” classifiers. The quality indicators of individual classifiers improve when the number of data clusters is increased to a certain threshold. The results obtained are applicable to classification when developing models and machine learning methods. The proposed method improved the classification quality indicators by 2–9% due to segmentation and the assignment of models with the best quality indicators in individual segments. Doi: 10.28991/ESJ-2024-08-01-025 Full Text: PDF\",\"PeriodicalId\":502658,\"journal\":{\"name\":\"Emerging Science Journal\",\"volume\":\"1064 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Emerging Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.28991/esj-2024-08-01-025\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Emerging Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28991/esj-2024-08-01-025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种建立和实施数据处理模型的解决方案，并通过实验评估了改进基于多级数据处理模型的集合方法的新可能性。本研究提出了一种模型，以降低数据属性转换时重新训练模型的成本。研究目标是在解决分类问题时改进机器学习模型的质量指标。新颖之处在于，该方法采用了数据处理模型的多级架构，以确定当前不同级别分段的数据属性，并分配具有最佳质量指标的算法。这种方法与已知的方法不同，它使用多个模型级别来分析数据属性，并为各个数据段和训练分配最佳模型。改进方法包括对数据样本进行无监督聚类。由此产生的聚类是用于分配最佳机器学习模型和算法的独立子样本。我们获得了不同分类器在整个样本和不同分段上的质量指标实验值。研究结果表明，使用多层次模型的无监督聚类可以显著提高 "弱 "分类器的质量指标。当数据聚类的数量增加到一定阈值时，单个分类器的质量指标会得到改善。所得结果适用于开发模型和机器学习方法时的分类。由于进行了分段并在单个分段中分配了质量指标最好的模型，所提出的方法将分类质量指标提高了 2-9%。Doi: 10.28991/ESJ-2024-08-01-025 全文：PDF

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving the Quality Indicators of Multilevel Data Sampling Processing Models Based on Unsupervised Clustering

This paper presents a solution for building and implementing data processing models and experimentally evaluates new possibilities for improving ensemble methods based on multilevel data processing models. This study proposes a model to reduce the cost of retraining models when transforming data properties. The research objective is to improve the quality indicators of machine learning models when solving classification problems. The novelty is a method that uses a multilevel architecture of data processing models to determine the current data properties in segments at different levels and assign algorithms with the best quality indicators. This method differs from the known ones by using several model levels that analyze data properties and assign the best models to individual segments of data and training. The improvement consists of using unsupervised clustering of data samples. The resulting clusters are separate subsamples for assigning the best machine-learning models and algorithms. Experimental values of quality indicators for different classifiers on the whole sample and different segments were obtained. The findings show that unsupervised clustering using multilevel models can significantly improve the quality indicators of “weak” classifiers. The quality indicators of individual classifiers improve when the number of data clusters is increased to a certain threshold. The results obtained are applicable to classification when developing models and machine learning methods. The proposed method improved the classification quality indicators by 2–9% due to segmentation and the assignment of models with the best quality indicators in individual segments. Doi: 10.28991/ESJ-2024-08-01-025 Full Text: PDF

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Emerging Science Journal

自引率

0.00%

发文量