Graph reduction techniques for instance selection: comparative and empirical study

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2024-12-20 DOI:10.1007/s10462-024-10971-4

Zahiriddin Rustamov, Nazar Zaki, Jaloliddin Rustamov, Ayham Zaitouny, Rafat Damseh

{"title":"Graph reduction techniques for instance selection: comparative and empirical study","authors":"Zahiriddin Rustamov, Nazar Zaki, Jaloliddin Rustamov, Ayham Zaitouny, Rafat Damseh","doi":"10.1007/s10462-024-10971-4","DOIUrl":null,"url":null,"abstract":"<div><p>The surge in data generation has prompted a shift to big data, challenging the notion that “more data equals better performance” due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 2","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10971-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10971-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The surge in data generation has prompted a shift to big data, challenging the notion that “more data equals better performance” due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates.

查看原文本刊更多论文

实例选择的图缩减技术：比较与实证研究

数据生成量的激增促使人们转向大数据，但由于处理和时间限制，"数据越多性能越好 "的观念受到了挑战。在不断发展的人工智能和机器学习领域，实例选择（IS）已成为在不影响模型质量的前提下减少数据的关键。传统的实例选择（IS）方法虽然高效，但在数据挖掘的大型复杂数据集中往往显得力不从心。本研究评估了基于图论的图缩减技术，将其作为实例选择的一种新方法。其目的是利用以图表示的数据固有结构来提高实例选择的有效性。我们在 29 个分类数据集上评估了 35 种图缩减技术。我们根据各种指标对这些技术进行了评估，包括准确率、F1 分数、缩减率和计算时间。图缩减方法在保持数据完整性的同时实现大幅缩减方面表现出了巨大的潜力。顶尖技术在保持或提高准确率的同时，实现了高达 99% 的缩减率。例如，在大型数据集上，多层次采样的准确率达到了 0.8555，缩减率为 99.16%，而莱顿采样则在小型数据集上表现出很高的有效性（准确率为 0.8034，缩减率为 97.87%）。计算效率差异很大，缩减时间从几毫秒到几分钟不等。这项研究推进了基于图的实例选择理论，并提供了实际应用指南。我们的研究结果表明，图缩减方法能有效保持数据质量，并提高大型复杂数据集的处理效率，其中一些技术在高缩减率的模型训练中实现了高达 160 倍的提速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.