Zahiriddin Rustamov, Nazar Zaki, Jaloliddin Rustamov, Ayham Zaitouny, Rafat Damseh
{"title":"Graph reduction techniques for instance selection: comparative and empirical study","authors":"Zahiriddin Rustamov, Nazar Zaki, Jaloliddin Rustamov, Ayham Zaitouny, Rafat Damseh","doi":"10.1007/s10462-024-10971-4","DOIUrl":null,"url":null,"abstract":"<div><p>The surge in data generation has prompted a shift to big data, challenging the notion that “more data equals better performance” due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 2","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10971-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10971-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The surge in data generation has prompted a shift to big data, challenging the notion that “more data equals better performance” due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates.
期刊介绍:
Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.